A Lightweight Framework for Tea Shoot Detection and Plucking Point Localization Enabled by Modified YOLOv11s-Seg Model
Yongmao Huang, Yuankai Luo, Yuanxi Mu, Haiyan JinIn this work, a lightweight framework enabled by the modified YOLOv11s-seg model for tea shoot detection and plucking point localization is proposed. Detecting tea shoots and localizing plucking points with higher accuracy generally require larger model size and more model parameters, making it difficult to balance accuracy and lightweighting. To overcome this limitation, a modified lightweight YOLOv11s-seg model is developed. First, the multi-scale edge information enhancement is introduced into the conventional YOLOv11s-seg to extract edge feature better and improve the detection accuracy of tea shoots. Meanwhile, context anchor attention is utilized to modify the cross stage partial spatial attention module in a backbone network to improve the detection capability for small objects. Moreover, the detail calibration reconstruction feature pyramid network is proposed. It utilizes spatial and contextual semantic information to reconstruct and calibrate features in key regions, enhancing the capability for object fusion and recognition at various scales. Furthermore, with the modified model performing instance segmentation to acquire the contour of each tea shoot, the coordinates of the three lowest pixel points in the contour are captured to localize the plucking point based on the average coordinates. In addition, the layer-adaptive magnitude-based pruning (LAMP) method is used to lighten the model. The experimental results show that the LAMP-pruned modified YOLOv11s-seg model with a speedup ratio of 1.5 achieves a mAP@0.5 of 86.5% for tea shoot detection, exhibiting a 4.7 percentage point improvement over the conventional YOLOv11s-seg model. Moreover, it exhibits an accuracy of 81.9% for plucking point localization on the validation and test subsets with 232 images in total, and its number of parameters, model size and floating point operations (FLOPs) separately achieve reductions of 67.3%, 66.2%, and 24.9% over the conventional model as well. Therefore, the proposed LAMP-pruned modified model shows good balance between lightweighting and detection accuracy. Finally, the modified LAMP-pruned YOLOv11s-seg model is deployed on a Jetson Orin NX edge module and measured in a tea plantation, with the measured results exhibiting a detection speed of 34.1 FPS and verifying its availability in practical applications.