Relation networks for few-shot video object detection
This paper describes a new few-shot video object detection framework that leverages spatio-temporal information through a relation module with attention mechanisms to mine relationships among proposals in different frames. The output of the relation module feeds a spatio-temporal double head with a category-agnostic confidence predictor to decrease overfitting in order to address the issue of reduced training sets inherent to few-shot solutions. The predicted score is the input to a long-term object linking approach that provides object tubes across the whole video, which ensures spatio-temporal consistency. Our proposal establishes a new state-of-the-art in the FSVOD500 dataset.
keywords: few-shot object detection, Video object detection
Publication: Congress
1688727271652
July 7, 2023
/research/publications/relation-networks-for-few-shot-video-object-detection
This paper describes a new few-shot video object detection framework that leverages spatio-temporal information through a relation module with attention mechanisms to mine relationships among proposals in different frames. The output of the relation module feeds a spatio-temporal double head with a category-agnostic confidence predictor to decrease overfitting in order to address the issue of reduced training sets inherent to few-shot solutions. The predicted score is the input to a long-term object linking approach that provides object tubes across the whole video, which ensures spatio-temporal consistency. Our proposal establishes a new state-of-the-art in the FSVOD500 dataset. - Daniel Cores, Lorenzo Seidenari, Alberto Del Bimbo, Víctor M. Brea, and Manuel Mucientes - 10.1007/978-3-031-36616-1_19 - 978-3-031-36615-4
publications_en