Many computer vision applications require real-time processing speeds, which prevents them from running an object detector on all frames of the sequence. In such circumstances, it is necessary to resort to motion estimation techniques in order to maintain the identity of the targets. This can be carried out by instantiating multiple single object trackers, if there are few targets, or through methods that globally extract the frame features, in order to share computations. The problem with the latter is that they yield features with limited semantic information and detect changes in the scene by performing multi-scale tests, which is inefficient and prone to errors. To solve these problems and provide accurate tracking for multiple objects in real-time, we propose SiamFAST. SiamFAST includes: a feature-pyramid-based region-of-interest extractor that produces quality features for both object exemplars and search areas; a pairwise depthwise region proposal network to compute fast similarities for several dozens of objects; and a multi-object penalization module in order to suppress the effect of distractors. SiamFAST has been validated on three public benchmarks, achieving leading performance against current state-of-the-art trackers.
Keywords: Computer Vision, Tracking, Motion Estimation, Multiple Object Tracking