Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking

Abstract : Transformer networks have proven extremely powerful for a wide variety of tasks since they were introduced. Computer vision is not an exception, as the use of transformers has become very popular in the vision community in recent years. Despite this wave, multiple-object tracking (MOT) exhibits for now some sort of incompatibility with transformers. We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for MOT. Inspired by recent research, we propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets. Methodologically, we propose the use of dense queries in a double-decoder network, to be able to robustly infer the heatmap of targets' centers and associate them through time. TransCenter outperforms the current state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our ablation study demonstrates the advantage in the proposed architecture compared to more naive alternatives. The code will be made publicly available.
Complete list of metadata
Contributor : Xavier Alameda-Pineda Connect in order to contact the contributor
Submitted on : Thursday, July 22, 2021 - 11:59:15 AM
Last modification on : Wednesday, May 4, 2022 - 12:00:02 PM

Links full text


  • HAL Id : hal-03295680, version 1
  • ARXIV : 2103.15145



Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, et al.. TransCenter: Transformers with Dense Queries for Multiple-Object Tracking. 2021. ⟨hal-03295680⟩



Record views