This paper introduces STDnet, Small Target Detection network, a fully convolutional network (ConvNet) focused on small targets. STDnet includes an early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions with small objects and their context. RCN allows to work with high resolution feature maps but with a reduced memory usage. The filtered feature maps, which only contain the most likely regions with small objects, are forwarded across the network up to an ending Region Proposal Network (RPN) which feeds a final classification stage. RCN is key to increase localization accuracy through finer spatial resolution due to finer global effective strides, low memory overhead and higher frame rates. We present a new video database, USC-GRAD-STDdb, with more than 56,000 annotated small targets, with sizes under 16x16 px, in challenging scenarios with clutter as a waving sea or air scenes below the skyline. Experimental results over USC-GRAD-STDdb show that STDnet improves the AP@.5 of the best state-of-the-art approach for small target detection from 50.8% to 57.4%.
Keywords: ConvNet, deep learning, object detection, small target