DOI: 10.20965/jrm.2026.p0938 ISSN: 1883-8049

Two-Stage Recognition Framework Based on YOLO and Siamese Networks for Crack Detection in Cherry Tomatoes

Zhaohui Tan, Masanori Sato

Here, we propose a deep learning-based two-stage recognition system for fruit-level crack classification in cherry tomatoes. This targets harvesting and sorting scenarios in real-world cultivation environments where leaves and stems are present. Cherry tomato cracking exhibits substantial visual variability, ranging from clearly split fruits to subtle white linear cracks around the calyx region. Therefore, crack-region-based or bounding-box-driven detection methods are highly susceptible to external noise, such as occlusions caused by leaves and stems, and illumination variations. This can strongly impair their generalization performance in field conditions. The wide diversity of crack appearances makes it difficult to collect sufficiently large and stable annotated datasets for robust training. To alleviate data scarcity, synthetic data generation was used to support model pre-training. Crack recognition in real-world environments was formulated as a two-stage framework comprising fruit detection followed by fruit-level crack classification. In the first stage, cherry tomatoes are detected using a You Only Look Once (YOLO)-based object detector. In the second stage, the detected fruit instances are classified as cracked or non-cracked through image-level classification using a Siamese network. Based on real-world environmental images, the proposed method achieved a crack classification accuracy of approximately 88% for red cherry tomatoes and successfully detected red cherry tomatoes, demonstrating its effectiveness for fruit-level crack differentiation under practical cultivation conditions.

More from our Archive