YOLOSO: An Improved YOLO-Based Algorithm for UAV to Detect Small Ground Targets
Bo Lang, Huamin Yang, Ruoning Xu, Hongzhi LiIn response to the challenges in UAV-oriented ground small-object localization and detection, including the easy loss of tiny target features, insufficient scale adaptability, severe interference from complex backgrounds, as well as high missed and false detection rates and the inadequate localization accuracy of the conventional YOLOv11n model in such scenarios, this paper takes YOLOv11n as the basic framework and performs systematic optimization from three aspects, network structure, core modules, and feature enhancement, proposing a lightweight small-object-enhanced detection algorithm named YOLOSO for UAV applications. By introducing a P2 high-resolution feature branch with a stride of 4, a four-scale detection structure consisting of P2-P3-P4-P5 is constructed, which reduces the minimum detection stride from 8 to 4 and alleviates the loss of detailed feature information for ultra-tiny targets. A bidirectional “top-down + bottom-up” multi-scale feature fusion strategy is utilized to improve the complementation between deep semantic information and shallow detailed features, while the core modules C3k2SO and C2PSASO are optimized and redesigned, respectively; by adjusting the channel compression ratio (0.25 for shallow modules and 0.75 for deep modules in C3k2SO; 0.25 in C2PSASO), optimizing the convolution kernel configuration (combining 1 × 3 and 3 × 1 convolutions), increasing the number of attention heads (from 4 to 8), and introducing residual connections with a 1 × 1 convolutional branch, the refinement and focusing ability of small-object feature extraction are improved. Additionally, an Enhanced Dual-branch Convolutional Block Attention Module (ED-CBAM) is proposed to further suppress background interference. Experimental results on the VisDrone2019-DET dataset demonstrate that the proposed YOLOSO contains 3.56M parameters and maintains a lightweight structure, attaining P, R, and mAP50 values of 47.2%, 36.8%, and 37.3% in the test set, which are 4.5 percentage points, 4.8 percentage points, and 3.7 percentage points higher than those of the baseline YOLOv11n (42.7%, 32.0% and 33.6%), respectively. Meanwhile, the medium-to-large version YOLOSO-S (14.85M parameters, 45.3% mAP50) reduces the number of parameters by 53.6% compared with the same-scale Rtdetr-L (32.0M) while achieving significantly better performance (37.8% mAP50). Experiments on the DOTAv1 dataset further confirm the generalization of YOLOSO, achieving 62.2% precision and 27.3% mAP50, outperforming all compared YOLO models. Evaluated on the DOTA-v1 dataset, YOLOSO achieves a feasible FPS of 20.53. Although slightly slower than mainstream lightweight YOLO models, the substantial accuracy gains fully offset the minor inference speed loss, and such performance trade-off is acceptable for practical UAV deployment. Ablation experiments verify that structural optimization (2.8 percentage points mAP50 improvement, from 33.6% to 36.4%) and the proposed C2PSASO (0.7 percentage points mAP50 improvement to 34.3%) and C3k2SO (1.4 percentage points mAP50 improvement to 35.0%) modules all contribute positive performance gains with favorable complementarity. While retaining lightweight characteristics, the model effectively enhances the detection accuracy of small objects in unmanned aerial vehicle scenarios and can provide technical references for practical applications such as remote sensing monitoring and security patrolling.