Weakly Supervised Fine-Grained Discrimination of Wheat Mold Using Local RGB–HSI Fusion
Le Xiao, Shengtong Wang, Lulu NiuWheat is a major staple crop, and storage mold growth poses a severe threat to grain safety and quality stability. Natural mold development in stored wheat exhibits subtle, localized, and highly heterogeneous characteristics. Existing unimodal methods and global fusion approaches generally suffer from insufficient local feature sensitivity, hindering fine-grained mold severity grading. To address this limitation, we propose a Mask-Guided Fine-Grained Fusion Network, a weakly supervised framework based on local RGB–HSI fusion. This framework employs a dynamic parallel A/B experimental design to construct time-matched proxy labels via weakly supervised learning. A standardized preprocessing pipeline including single-kernel extraction, foreground segmentation, and cross-modal registration is established to resolve RGB–HSI spatial misalignment, ensuring physical-level spatial consistency of multimodal features. The model incorporates a Foreground-Aware Spectral Recalibration (FASR) module to suppress background noise, a Mask-Guided Dilated Cross-modal Local Attention (MDCLA) mechanism to establish fine-grained local mappings between RGB visual phenotypes and hyperspectral responses, and a sample-level adaptive fusion strategy to dynamically weight features by modal reliability, enhancing representation of complex samples across all mold stages. Experiments show that the Mask-Guided Fine-Grained Fusion Network achieves 0.9689 classification accuracy, 0.9698 Macro-F1 score, and 0.0593 Mean Absolute Error (MAE), significantly outperforming state-of-the-art unimodal deep models and global attention fusion baselines. This work provides a proof-of-principle framework for fine-grained non-destructive mold risk assessment in stored wheat.