An update towards a more efficient and powerful TransCenter, TransCenter-Lite.
TransCenter: Transformers with Dense Representations for Multiple-Object Tracking
Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda
[Paper] [Project]
Bibtex
If you find this code useful, please star the project and consider citing:
@misc{xu2021transcenter,
title={TransCenter: Transformers with Dense Representations for Multiple-Object Tracking},
author={Yihong Xu and Yutong Ban and Guillaume Delorme and Chuang Gan and Daniela Rus and Xavier Alameda-Pineda},
year={2021},
eprint={2103.15145},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Environment Preparation
Option 1 (recommended):
We provide a singularity image (similar to docker) containing all the packages we need for TransCenter:
- Install singularity > 3.7.1: https://sylabs.io/guides/3.0/user-guide/installation.html#install-on-linux
- Download one of the singularity images:
transcenter_singularity.sif tested with Nvidia RTX TITAN, Quadro RTX 8000, RTX 2080Ti, Quadro RTX 4000.
- Launch a Singularity image
singularity shell --nv --bind yourLocalPath:yourPathInsideImage YourSingularityImage.sif
- -bind: to link a singularity path with a local path. By doing this, you can find data from local PC inside Singularity image;
- -nv: use the local Nvidia driver.
Option 2:
You can also build your own environment:
- we use anaconda to simplify the package installations, you can download anaconda (4.10.3) here: https://www.anaconda.com/products/individual
- you can create your conda env by doing
conda env create -n <env_name> -f eTransCenter.yml
- TransCenter uses Deformable transformer from Deformable DETR. Therefore, we need to install deformable attention modules:
cd ./to_install/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
4.for the up-scale and merge module in TransCenter, we use deformable convolution module, you can install it with:
cd ./to_install/DCNv2
./make.sh # build
python testcpu.py # run examples and gradient check on cpu
python testcuda.py # run examples and gradient check on gpu
See also known issues from https://github.com/CharlesShang/DCNv2. If you have issues related to cuda of the third-party modules, please try to recompile them in the GPU that you use for training and testing. The dependencies are compatible with Pytorch 1.6, cuda 10.1. For torch > 1.11, please use:https://github.com/jinfagang/DCNv2_latest If you have issues with pyaml: please downgrade to pip install pyyaml==5.4.1.
For problems related to torchvision >= 0.10: "ImportError: cannot import name '_NewEmptyTensorOp' from 'torchvision.ops.misc'" Please check: https://github.com/megvii-research/SOLQ/issues/7#issuecomment-898169459
If you install the DCNv2 and Deformable Transformer packages from other implementations, please replace the corresponding files with dcn_v2.py and ms_deform_attn.py in ./toinstall for allowing half-precision operations with the customized packages.
Data Preparation
ms coco: we use only the person category for pretraining TransCenter. The code for filtering is provided in ./data/coco_person.py.
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
CrowdHuman: CrowdHuman labels are converted to coco format, the conversion can be done through ./data/convert_crowdhuman_to_coco.py.
@article{shao2018crowdhuman,
title={CrowdHuman: A Benchmark for Detecting Human in a Crowd},
author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian},
journal={arXiv preprint arXiv:1805.00123},
year={2018}
}
MOT17: MOT17 labels are converted to coco format, the conversion can be done through ./data/convert_mot_to_coco.py.
@article{milan2016mot16,
title={MOT16: A benchmark for multi-object tracking},
author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
journal={arXiv preprint arXiv:1603.00831},
year={2016}
}
MOT20: MOT20 labels are converted to coco format, the conversion can be done through ./data/convert_mot20_to_coco.py.
@article{dendorfer2020mot20,
title={Mot20: A benchmark for multi object tracking in crowded scenes},
author={Dendorfer, Patrick and Rezatofighi, Hamid and Milan, Anton and Shi, Javen and Cremers, Daniel and Reid, Ian and Roth, Stefan and Schindler, Konrad and Leal-Taix{\'e}, Laura},
journal={arXiv preprint arXiv:2003.09003},
year={2020}
}
We also provide the filtered/converted labels:
ms coco person labels: please put the annotations folder inside cocoperson to your ms coco dataset root folder.
CrowdHuman coco format labels: please put the annotations folder inside crowdhuman to your CrowdHuman dataset root folder.
MOT17 coco format labels: please put the annotations and annotations_onlySDP folders inside MOT17 to your MOT17 dataset root folder.
MOT20 coco format labels: please put the annotations folder inside MOT20 to your MOT20 dataset root folder.
Model Zoo
For TransCenter:
PVTv2 pretrained: pretrained model from deformable-DETR.
coco_pretrained: model trained with coco person dataset.
MOT17_fromCoCo: model pretrained on coco person and fine-tuned on MOT17 trainset.
MOT17_trained_with_CH: model trained on CrowdHuman and MOT17 trainset.
MOT20_fromCoCo: model pretrained on coco person and fine-tuned on MOT20 trainset.
MOT20_trained_with_CH: model trained on CrowdHuman and MOT20 trainset.
For TransCenter-Lite:
coco_pretrained_lite: model trained with coco person dataset.
MOT17_trained_with_CH_lite: model trained on CrowdHuman and MOT17 trainset.
MOT20_trained_with_CH_lite: model trained on CrowdHuman and MOT20 trainset.
Please put all the pretrained models to ./model_zoo .
Training
For TransCenter:
- Pretrained on coco person dataset:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=4 --use_env ./training/main_coco.py --output_dir=./outputs/whole_coco --batch_size=4 --num_workers=8 --pre_hm --tracking --nheads 1 2 5 8 --num_encoder_layers 3 4 6 3 --dim_feedforward_ratio 8 8 4 4 --d_model 64 128 320 512 --data_dir=YourPathTo/cocodataset/
- Pretrained on CrowdHuman dataset:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=4 --use_env ./training/main_crowdHuman.py --output_dir=./outputs/whole_ch_from_COCO --batch_size=4 --num_workers=8 --resume=./model_zoo/coco_pretrained.pth --pre_hm --tracking --nheads 1 2 5 8 --num_encoder_layers 3 4 6 3 --dim_feedforward_ratio 8 8 4 4 --d_model 64 128 320 512 --data_dir=YourPathTo/crowd_human/
- Train MOT17 from CoCo pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot17.py --output_dir=./outputs/mot17_from_coco --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT17/ --epochs=50 --lr_drop=40 --nheads 1 2 5 8 --num_encoder_layers 3 4 6 3 --dim_feedforward_ratio 8 8 4 4 --d_model 64 128 320 512 --pre_hm --tracking --resume=./model_zoo/coco_pretrained.pth --same_aug_pre --image_blur_aug --clip_max_norm=35
- Train MOT17 together with CrowdHuman:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot17_mix_ch.py --output_dir=./outputs/CH_mot17 --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT17/ --data_dir_ch=YourPathTo/crowd_human/ --epochs=150 --lr_drop=100 --nheads 1 2 5 8 --num_encoder_layers 3 4 6 3 --dim_feedforward_ratio 8 8 4 4 --d_model 64 128 320 512 --pre_hm --tracking --same_aug_pre --image_blur_aug --clip_max_norm=35
- Train MOT20 from CoCo pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot20.py --output_dir=./outputs/mot20_from_coco --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT20/ --epochs=50 --lr_drop=40 --nheads 1 2 5 8 --num_encoder_layers 3 4 6 3 --dim_feedforward_ratio 8 8 4 4 --d_model 64 128 320 512 --pre_hm --tracking --resume=./model_zoo/coco_pretrained.pth --same_aug_pre --image_blur_aug --clip_max_norm=35
- Train MOT20 together with CrowdHuman:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot20_mix_ch.py --output_dir=./outputs/CH_mot20 --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT20/ --data_dir_ch=YourPathTo/crowd_human/ --epochs=150 --lr_drop=100 --nheads 1 2 5 8 --num_encoder_layers 3 4 6 3 --dim_feedforward_ratio 8 8 4 4 --d_model 64 128 320 512 --pre_hm --tracking --same_aug_pre --image_blur_aug --clip_max_norm=35
For TransCenter-Lite:
- Pretrained on coco person dataset:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=4 --use_env ./training/main_coco_lite.py --output_dir=./outputs/whole_coco_lite --batch_size=4 --num_workers=8 --pre_hm --tracking --nheads 1 2 5 8 --num_encoder_layers 2 2 2 2 --dim_feedforward_ratio 8 8 4 4 --d_model 32 64 160 256 --num_decoder_layers 4 --data_dir=YourPathTo/cocodataset/
- Pretrained on CrowdHuman dataset:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=4 --use_env ./training/main_crowdHuman_lite.py --output_dir=./outputs/whole_ch_from_coco_lite --batch_size=4 --num_workers=8 --resume=./model_zoo/coco_pretrained_lite.pth --pre_hm --tracking --nheads 1 2 5 8 --num_encoder_layers 2 2 2 2 --dim_feedforward_ratio 8 8 4 4 --d_model 32 64 160 256 --num_decoder_layers 4 --data_dir=YourPathTo/crowd_human/
- Train MOT17 from CoCo pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot17_lite.py --output_dir=./outputs/mot17_from_coco_lite --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT17/ --epochs=50 --lr_drop=40 --nheads 1 2 5 8 --num_encoder_layers 2 2 2 2 --dim_feedforward_ratio 8 8 4 4 --d_model 32 64 160 256 --num_decoder_layers 4 --pre_hm --tracking --resume=./model_zoo/coco_pretrained_lite.pth --same_aug_pre --image_blur_aug --clip_max_norm=35
- Train MOT17 together with CrowdHuman:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot17_mix_ch_lite.py --output_dir=./outputs/CH_mot17_lite --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT17/ --data_dir_ch=YourPathTo/crowd_human/ --epochs=150 --lr_drop=100 --nheads 1 2 5 8 --num_encoder_layers 2 2 2 2 --dim_feedforward_ratio 8 8 4 4 --d_model 32 64 160 256 --num_decoder_layers 4 --pre_hm --tracking --same_aug_pre --image_blur_aug --clip_max_norm=35
- Train MOT20 from CoCo pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot20_lite.py --output_dir=./outputs/mot20_from_coco_lite --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT20/ --epochs=50 --lr_drop=40 --nheads 1 2 5 8 --num_encoder_layers 2 2 2 2 --dim_feedforward_ratio 8 8 4 4 --d_model 32 64 160 256 --num_decoder_layers 4 --pre_hm --tracking --resume=./model_zoo/coco_pretrained_lite.pth --same_aug_pre --image_blur_aug --clip_max_norm=35
- Train MOT20 together with CrowdHuman:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/main_mot20_mix_ch_lite.py --output_dir=./outputs/CH_mot20_lite --batch_size=4 --num_workers=8 --data_dir=YourPathTo/MOT20/ --data_dir_ch=YourPathTo/crowd_human/ --epochs=150 --lr_drop=100 --nheads 1 2 5 8 --num_encoder_layers 2 2 2 2 --dim_feedforward_ratio 8 8 4 4 --d_model 32 64 160 256 --num_decoder_layers 4 --pre_hm --tracking --same_aug_pre --image_blur_aug --clip_max_norm=35
Tips:
- If you encounter RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR in some GPUs, please try to set torch.backends.cudnn.benchmark=False. In most of the cases, setting torch.backends.cudnn.benchmark=True is more memory-efficient.
- Depending on your environment and GPUs, you might experience MOTA jitter in your final models.
- You may see training noise during fine-tuning, especially for MOT17/MOT20 training with well-pretrained models. You can slow down the training rate by 1/10, apply early stopping, increase batch size with GPUs having more memory.
- If you have GPU memory issues, try to lower the batch size for training and evaluation in main_****.py, freeze the resnet backbone and use our coco/CH pretrained models.
Tracking
###Using Private detections:
For TransCenter:
- MOT17:
cd TransCenter_official
python ./tracking/mot17_private_test.py --data_dir=YourPathTo/MOT17/
- MOT20:
cd TransCenter_official
python ./tracking/mot20_private_test.py --data_dir=YourPathTo/MOT20/
For TransCenter-Lite:
- MOT17:
cd TransCenter_official
python ./tracking/mot17_private_lite_test.py --data_dir=YourPathTo/MOT17/
- MOT20:
cd TransCenter_official
python ./tracking/mot20_private_lite_test.py --data_dir=YourPathTo/MOT20/
###Using Public detections:
For TransCenter:
- MOT17:
cd TransCenter_official
python ./tracking/mot17_pub_test.py --data_dir=YourPathTo/MOT17/
- MOT20:
cd TransCenter_official
python ./tracking/mot20_pub_test.py --data_dir=YourPathTo/MOT20/
For TransCenter-Lite:
- MOT17:
cd TransCenter_official
python ./tracking/mot17_pub_lite_test.py --data_dir=YourPathTo/MOT17/
- MOT20:
cd TransCenter_official
python ./tracking/mot20_pub_lite_test.py --data_dir=YourPathTo/MOT20/
MOTChallenge Results
For TransCenter:
MOT17 public detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 71.9% | 80.5% | 64.1% | 27,356 | 126,860 | 4,118 |
CH | 75.9% | 81.2% | 65.9% | 30,190 | 100,999 | 4,626 |
MOT20 public detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 67.7% | 79.8% | 58.9% | 54,967 | 108,376 | 3,707 |
CH | 72.8% | 81.0% | 57.6% | 28,026 | 110,312 | 2,621 |
MOT17 private detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 72.7% | 80.3% | 64.0% | 33,807 | 115,542 | 4,719 |
CH | 76.2% | 81.1% | 65.5% | 40,101 | 88,827 | 5,394 |
MOT20 private detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 67.7% | 79.8% | 58.7% | 56,435 | 107,163 | 3,759 |
CH | 72.9% | 81.0% | 57.7% | 28,596 | 108,982 | 2,625 |
Note:
- The results can be slightly different depending on the running environment.
- We might keep updating the results in the near future.
Acknowledgement
The code for TransCenter, TransCenter-Lite is modified and network pre-trained weights are obtained from the following repositories:
- The PVTv2 backbone pretrained models from PVTv2.
- The data format conversion code is modified from CenterTrack.
CenterTrack, Deformable-DETR, Tracktor.
@article{zhou2020tracking,
title={Tracking Objects as Points},
author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
journal={ECCV},
year={2020}
}
@InProceedings{tracktor_2019_ICCV,
author = {Bergmann, Philipp and Meinhardt, Tim and Leal{-}Taix{\'{e}}, Laura},
title = {Tracking Without Bells and Whistles},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}}
@article{zhu2020deformable,
title={Deformable DETR: Deformable Transformers for End-to-End Object Detection},
author={Zhu, Xizhou and Su, Weijie and Lu, Lewei and Li, Bin and Wang, Xiaogang and Dai, Jifeng},
journal={arXiv preprint arXiv:2010.04159},
year={2020}
}
@article{zhang2021bytetrack,
title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
journal={arXiv preprint arXiv:2110.06864},
year={2021}
}
@article{wang2021pvtv2,
title={Pvtv2: Improved baselines with pyramid vision transformer},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
journal={Computational Visual Media},
volume={8},
number={3},
pages={1--10},
year={2022},
publisher={Springer}
}
Several modules are from:
MOT Metrics in Python: py-motmetrics
Soft-NMS: Soft-NMS
DETR: DETR
DCNv2: DCNv2
PVTv2: PVTv2
ByteTrack: ByteTrack