Efficient Semantic Segmentation of Multispectral Land Cover Images Using Mask2Former
Semantic segmentation for EO is a process that involves assigning a specific label or category to each pixel in an image, enabling precise analysis for land cover applications such as environmental conservation, urban planning or disaster management. Deep learning-based segmentation models have proliferated in recent years, but they often are not well adapted to the unique properties of multi and hyperspectral images, frequently used in remote sensing. Mask2Former is a universal segmentation model based on the concept of masked attention and employs a pretrained classification model as backbone to create intermediate representations. This article presents a preliminary adaptation of Mask2Former for the segmentation of multispectral remote sensing images. This adaptation includes modifying the backbone to accept multispectral inputs and adapting the data processing pipelines to leverage all available spectral bands effectively. The computational cost of the method has also been analyzed as an initial assessment of potential scalability and efficiency for large-scale applications. Experimental results using the FiveBillionPixels dataset reveal a notable improvement in segmentation accuracy when incorporating multispectral bands, outperforming RGB-only performance without a relevant increase in computational cost.
keywords: Land cover classication, Transformers, semantic segmentation, Multispectral, computational cost