YOLO-UTD: A Domain-Specific Detection Framework for Small Objects in UAV Traffic Surveillance
Hailang Huang, Meng Li, Jiebao Zhang, Yitong LiDetecting objects in drone-captured aerial imagery is particularly formidable due to challenges such as the prevalence of numerous small targets and their dense spatial distribution. To bridge this gap, this paper introduces YOLO-UTD (YOLO-UAV Traffic Detection), a dedicated small object detector tailored for drone traffic surveillance. Built upon the YOLOv8 framework, the proposed model incorporates three principal enhancements. First, a specialized small-object detection head replaces the original large-object head to increase the sensitivity to fine-grained features. Second, we introduce a shallow-augmented feature pyramid network (SFPN) into the neck module. The SFPN enriches the semantic content of high-resolution shallow features via dense multiscale interactions and CARAFE upsampling, boosting performance on small targets. Finally, a C2fA layer is integrated into the deep backbone stages to adaptively fuse spatial details and semantic context through a dual-path architecture and a cross-attention mechanism, thereby dynamically refining features critical for small objects. Extensive experiments on the VisDrone2019 dataset validate that YOLO-UTD achieves a 3.6% higher mean average precision (mAP) than YOLOv8 while preserving a low parameter footprint, with a particularly significant gain of 5.3% in vehicle detection accuracy. These findings confirm the model’s efficacy and strong potential for application in smart city drone surveillance.