DCFP-YOLO: A Dual-Backbone Feature Fusion Network for Multi-Pose Chili Flower Recognition and Edge Deployment
Minqiu Kuang, Xiaojian Li, Fangping Xie, Shang Chen, Dawei Liu, Yang Xiang, Bei Wu, Feng Liu, Yuxuan Zhang, Xu LiTo address the challenges of difficult feature extraction and insufficient recognition accuracy caused by the small size of chili flowers, occlusion by branches and leaves, and illumination variations in complex field environments, a dual-backbone-based chili flower pose estimation algorithm, termed DCFP-YOLO, is proposed. Built upon the YOLO11n framework, the proposed method performs classification and recognition of five typical upward-oriented chili flower poses. To alleviate the loss of local detail features of small chili flowers under complex backgrounds, a dual-backbone feature extraction network composed of StarNet and ShuffleNetV2 is constructed. Specifically, the StarNet backbone enhances the extraction of fine-grained local features from key floral regions, while the ShuffleNetV2 backbone improves the perception of global spatial structural information. The complementary fusion of dual-backbone features strengthens the representation capability of chili flower pose features in complex environments. To mitigate the attenuation of shallow detail information during multi-scale feature transmission, a Bidirectional Multi-branch Auxiliary Feature Pyramid Network (BiMAFPN) is designed to enhance feature propagation through cross-scale feature interaction, thereby improving pose recognition performance under occlusion and overlapping conditions. Furthermore, a Programmable Gradient Information (PGI)-assisted training mechanism is introduced to optimize gradient propagation paths and alleviate information bottlenecks in deep networks, thereby enhancing the robustness of multi-pose feature extraction under occlusion, blur, and complex illumination conditions. Experimental results demonstrate that DCFP-YOLO achieves recall, mAP50, and mAP50 values of 87.4%, 92.0%, and 66.9%, respectively, representing improvements of 1.7, 1.3, and 3.5 percentage points over the baseline model. Overall performance surpasses that of current mainstream object detection algorithms. After deployment on the NVIDIA Jetson AGX Orin platform, the model achieves an inference speed of 20.9 frames/s, which can basically satisfy the real-time perception requirements of chili flower pose recognition in complex agricultural environments. The proposed method provides an effective visual perception framework for chili flower pose recognition in complex agricultural environments. Rather than constituting a complete robotic pollination solution, the developed model serves as a potential perception component for future intelligent pollination robotic systems, providing reliable flower pose information for subsequent research on target localization, end-effector alignment, and robotic pollination in unstructured greenhouse environments.