DOI: 10.3390/horticulturae12070770 ISSN: 2311-7524

Tomato Visual Object Detection Method Based on the Mamba State Space Model

Wenhao Li, Hengyi Zheng, Chengheng Zhao, Wei Liu, Shunjie Li, Mengbo Qian

Tomato harvesting still relies heavily on manual labor, while factors such as clustered fruit growth, inconsistent ripening stages, occlusion, and complex cultivation environments pose significant challenges to automated harvesting systems and place higher demands on target detection accuracy. To address these issues, a tomato detection method based on the Mamba state space model was proposed, and an improved model termed YOLO-VCW was developed based on YOLOv8n. Specifically, the original C2f module in the backbone network was replaced with the C2f-VSS module to enhance global contextual feature extraction. A Coordinate Attention mechanism was introduced into the feature fusion stage to improve the model’s ability to focus on tomato target regions under complex background and occlusion conditions. In addition, the WIoUv3 loss function was adopted in the detection head to improve localization accuracy and training stability in overlapping fruit scenarios. Experimental results showed that YOLO-VCW achieved a precision of 91.33%, a recall of 86.79%, and an F1-score of 89.00% on the tomato dataset. Compared with YOLOv8n, the proposed model improved precision, recall, F1-score, and mAP50 by 1.90%, 4.43%, 3.25%, and 4.44%, respectively, with only a slight increase in Parameters to 3.9 M. These results demonstrate that YOLO-VCW provides effective and robust performance for tomato target detection in complex environments.

More from our Archive