Anis Yassine BEN MABROUK
Multi-object tracking in complex scenarios: benchmark, in-depth analysis, and pair-dependent representations for visual similarity
Abstract
Multi-object tracking (MOT) aims to assign consistent identifiers to objects in a video sequence, even under occlusions, scale changes, and varying motion. Modern MOT systems often follow a detect-to-track paradigm, where objects are first detected in each frame and then associated over time using motion and visual cues. Motion cues typically rely on kinematic models such as Kalman filters, while visual cues compare object appearances using learned embeddings.
In standard settings with high-frame-rate cameras, simple motion and appearance models are effective. However, in low-frame-rate or high-speed-motion scenarios, large inter-frame displacements make association unreliable, requiring more robust methods. This thesis highlights the shortcomings of both motion- and appearance-based association in the low-frame-rate setting. Focusing on visual cues, we first expose limitations of the re-identification (Re-ID) evaluation protocol and propose an alternative that better measures generalisation to unseen vehicle types. We then introduce DRnet and PCB, two pair-dependent representation approaches for recognition and re-identification, respectively, that generalise more effectively to unseen vehicle types.
Finally, we revisit multi-object tracking, extensively benchmarking state-of-the-art methods across diverse datasets with challenges such as difficult motion, heavy occlusions, and target scale variation. We then focus on the best overall performer, DiffMOT, a diffusion-based approach for predicting non-linear motion, to better understand its limitations and identify directions for improvement.
In summary, this thesis offers an extensive study of multi-object tracking and re-identification, proposes an alternative Re-ID evaluation protocol emphasising generalisation, introduces pair-dependent representations that improve recognition and re-identification, and suggests ways to advance motion modelling.
Keywords
Multi Object Tracking, Recognition, Re-Identification, Attention Mechanism
Supervision
- Axel Davy
- Gabriele Facciolo
- Rafael Grompone
Jury
- Saïd Ladjal, rapporteur
- Matias Di Martino, rapporteur
- Angélique Loesch, examinatrice
- Laurent Oudre, examinateur