DOI: 10.3390/s26133970 ISSN: 1424-8220

MSA-DET: A Multi-Scale Attention Network with Adaptive Feature Fusion for SAR Ship Detection

Sai Wan, Zhiyong Tao, Lu Chen

Synthetic aperture radar (SAR) ship detection faces three persistent challenges: coherent speckle noise that obscures target boundaries, heterogeneous background clutter in coastal and harbor scenes, and ship targets whose spatial extent varies by more than an order of magnitude within the same image. To address these issues jointly, this paper proposes MSA-DET, an improved SAR ship detection network built upon YOLOv11. In the backbone, a Multi-Scale Cross-axis Attention module (MSCAttention) runs horizontal and vertical axial attention branches in parallel across multiple receptive-field scales, sharpening feature representations for ship targets that vary widely in size and orientation. In the neck, the standard C3k2 block is redesigned as C3k2_SSA by embedding sparse self-attention, which selectively focuses on the most discriminative spatial tokens while suppressing speckle interference and reducing computational overhead. An Adaptive Spatial Feature Fusion detection head (ASFF) replaces fixed pyramid-level aggregation with learned per-pixel blending weights, resolving gradient conflicts across scales and improving localization consistency for both small and large ships. On the HRSID dataset, MSA-DET achieves an mAP@0.5:0.95 of 63.6% and mAP@0.5 of 88.1%, representing gains of 4.0% and 1.6% over the YOLOv11n baseline; on SSDD, it reaches 69.6% and 97.7%, surpassing the baseline by 7.2% and 2.1%, respectively. These results demonstrate that coordinated multi-stage redesign—rather than isolated module substitution—is an effective strategy for SAR-oriented ship detection. The accuracy gains are accompanied by a moderate increase in model size (8.9 M parameters versus 2.6 M for YOLOv11n) and computational cost (9.6 G FLOPs versus 6.3 G), a trade-off that is justified by the substantial improvement in detection quality.

More from our Archive