PhD Defense: 'Visual Multi-Object Tracking through Deep Learning'

This thesis presents novel deep-learning approaches for tracking multiple objects in videos. Traditional multi-object trackers, which rely on frame-by-frame detections and primarily geometric attributes, are ill-suited for real-time environments and open-set scenarios. To overcome these limitations, we introduce SiamMT, an architecture that adapts single-object tracking techniques for handling multiple arbitrary targets in real time. This approach is further refined with SiamMOTION, which effectively manages distractors and accommodates objects of varying sizes by extracting semantically-richer features and proposing more accurate search areas. Lastly, we present ByteFormer, a Transformer-based architecture that complements a detector by recovering missed objects for enhanced performance in multi-object tracking.

Supervisors: Manuel Mucientes Molina & Víctor Manuel Brea Sánchez