DOI: 10.3390/agriculture16131414 ISSN: 2077-0472

Multimodal Deep Learning for Pest and Disease Recognition and Crop Growth Assessment in Open-Field Agricultural Environments

Jiayu Xiang, Jianxiang Pan, Hanwen Zhang, Xuekun Liu, Boxiu Liu, Jieling Tian, Shuo Yan

Against the backdrop of the rapid development of smart agriculture, pest and disease monitoring and crop growth assessment for large-scale farmlands are of substantial importance for precision management and risk early warning. However, traditional unimodal visual methods are highly susceptible to illumination variation, canopy occlusion, scale differences, and background interference in real field environments, and thus fail to make full use of environmental sensing information and spatial priors. To address these issues, a multimodal target perception framework for intelligent farmland inspection is proposed in this study. By jointly integrating UAV imagery, time-series data from ground Internet of Things sensors, and spatial positional information, joint modeling of pest and disease recognition and crop growth assessment is achieved through cross-modal alignment and collaborative encoding, multi-scale target perception, and dynamic multimodal fusion and decision-making. Experimental results demonstrate that, in the pest and disease recognition task, the proposed method achieved a Precision of 91.63%, a Recall of 90.27%, an F1-score of 90.94%, and an mAP of 93.15%, significantly outperforming comparison models such as Faster R-CNN with ResNet50 backbone, YOLOv8-m, Swin Transformer-Tiny, and Multimodal Transformer. In the crop growth assessment task, an Accuracy of 89.96%, a Precision of 89.11%, a Recall of 88.74%, and a Macro-F1 of 88.92% were achieved, again clearly exceeding those of ResNet50, EfficientNet-B3, ViT-B/16, and conventional multimodal fusion models. The ablation study further verified the effectiveness of the cross-modal alignment module, the multi-scale target perception module, and the dynamic fusion module, with the complete model reaching 90.94%, 93.15%, and 88.92% in Pest F1, Pest mAP, and Growth Macro-F1, respectively. Furthermore, the net economic return regression experiment at the unit-area level further demonstrates that the proposed method can effectively connect state information with economic outcomes, showing strong application potential in return prediction, performance evaluation, and resource allocation optimization. These findings indicate that the proposed method can effectively improve perception accuracy and robustness in complex farmland environments, thereby providing reliable technical support for intelligent inspection, pest and disease early warning, and precision management in agricultural scenarios.

More from our Archive