DOI: 10.3390/plants15131932 ISSN: 2223-7747

A Dual-Side Synergistic LoRA Framework for Full-Chain Fine-Tuning of Qwen2.5-VL for Plant Disease Diagnosis

Zhengyan Zhang, Quan Feng

The emergence of multimodal large language models (MLLMs) is opening a new avenue for explainable and interactive intelligent diagnosis in agriculture. However, generic MLLMs still face two major obstacles in plant disease recognition—insufficient fine-grained visual perception and misalignment between visual and linguistic features—which jointly limit diagnostic accuracy. To address these issues, we propose a Qwen2.5-VL-based full-chain fine-tuning framework termed dual-side synergistic low-rank adaptation. Unlike the mainstream paradigm that freezes the vision encoder, our method injects trainable LoRA adapters into both the vision encoder and the large language model, while establishing end-to-end gradient backpropagation across the entire multimodal pipeline. By using the supervision signal from autoregressive text generation (text-supervised visual learning), the framework directly drives deep optimization of visual representations, thereby enabling coordinated alignment between pixel-level perception and semantic-level understanding. We trained Qwen over CDDM and conducted in-domain (CDDM) and cross-domain (PlantVillage) experiments. The results show that the proposed 7B-parameter model achieves 98.8 and 96.0% diagnostic accuracy under in-domain and cross-domain scenarios, respectively. The recognition accuracy of Qwen in the case of cross-domain only decreases slightly, which demonstrates that the MLLM trained by our method exhibits excellent cross-domain recognition capability. This indicates that our method can significantly improve the robustness and generalization ability of MLLM in complex agricultural scenarios.

More from our Archive