Research on Fruit Spatial Coordinate Positioning by Combining Improved YOLOv8s and Adaptive Multi-Resolution ModelDexiao Kong, Jiayi Wang, Qinghui Zhang, Junqiu Li, Jian Rong
- Agronomy and Crop Science
Automated fruit-picking equipment has the potential to significantly enhance the efficiency of picking. Accurate detection and localization of fruits are particularly crucial in this regard. However, current methods rely on expensive tools such as depth cameras and LiDAR. This study proposes a low-cost method based on monocular images to achieve target detection and depth estimation. To improve the detection accuracy of targets, especially small targets, an advanced YOLOv8s detection algorithm is introduced. This approach utilizes the BiFormer block, an attention mechanism for dynamic query-aware sparsity, as the backbone feature extractor. It also adds a small-target-detection layer in the Neck and employs EIoU Loss as the loss function. Furthermore, a fused depth estimation method is proposed, which incorporates high-resolution, low-resolution, and local high-frequency depth estimation to obtain depth information with both high-frequency details and low-frequency structure. Finally, the spatial 3D coordinates of the fruit are obtained by fusing the planar coordinates and depth information. The experimental results with citrus as the target result in an improved YOLOv8s network mAP of 88.45% and a recognition accuracy of 94.7%. The recognition of citrus in a natural environment was improved by 2.7% compared to the original model. In the detection range of 30 cm~60 cm, the depth-estimation results (MAE, RSME) are 0.53 and 0.53. In the illumination intensity range of 1000 lx to 5000 lx, the average depth estimation results (MAE, RSME) are 0.49 and 0.64. In the simulated fruit-picking scenario, the success rates of grasping at 30 cm and 45 cm were 80.6% and 85.1%, respectively. The method has the advantage of high-resolution depth estimation without constraints of camera parameters and fruit size that monocular geometric and binocular localization do not have, providing a feasible and low-cost localization method for fruit automation equipment.