Doctoral Meeting: 'Segmentation-Guided Association Refinement in Multiple Object Tracking'

Multiple Object Tracking (MOT) aims to detect all objects in a video sequence and maintain consistent identities across frames. While Tracking-by-Detection (TbD) remains the dominant paradigm due to its effectiveness, its reliance on object detectors makes it vulnerable to failures, particularly in occlusion scenarios. These failures often result in fragmented trajectories and degraded tracking performance. In this work, we propose SGAR-MOT ---which stands for Segmentation-Guided Association Refinement in Multiple Object Tracking---, a novel framework that enhances TbD pipelines by recovering object tracks lost due to detection errors. SGAR-MOT introduces the Object Recovery Protocol (ORP), which leverages segmentation masks and integrates a video segmentation method with a Vision Transformer-based module, the Mask Instance Resolver (MIR), to assess track continuity. Experimental results across multiple benchmarks, including MOT20, SportsMOT and VisDrone, demonstrate that SGAR-MOT consistently outperforms its baseline tracker and achieves competitive results against state-of-the-art methods. These findings highlight the potential of segmentation-guided recovery to improve robustness in modern MOT systems.