SGM-DETR: Semantic-Guided and Feature-Refined Transformer for Pine Wilt Disease Detection in Satellite Imagery
Xixin Chen, Zidi Wu, Zhuangci Wu, Xiaobo Tan, Yongfei Xue, Yuanhan Luo, Peng Wang, Wenjing Huang, Jianhua He, Jie Zhang, Jizheng YiPine wilt disease (PWD) can spread rapidly after the disease occurs and often causes large-scale death of the pine. Therefore, the timely identification of infected trees is critical for forest conservation and effective disease management. However, early infected trees are difficult to distinguish in satellite remote sensing images. Their visual differences from healthy trees and complex background features are often subtle, and existing image-processing methods do not fully exploit heterogeneous information. To address this problem, we constructed the Naro dataset for satellite-based PWD detection and proposed SGM-RTDETR based on Real-Time Detection Transformer (RT-DETR). The proposed model consists of a Semantic–Visual Fusion Module (SVFM) and a Disease Feature Refinement Module (DFRM). In SVFM, ExG, VARI, and GLI are concatenated with RGB imagery to form a six-channel visual input, which enhances the spectral differences between diseased and non-diseased targets. In addition, textual prior knowledge is introduced into the decoder input through a Stackelberg game-based visual–text fusion strategy. This strategy helps the encoded memory features maintain clearer disease-related semantics in complex backgrounds. DFRM then performs channel recalibration, feature refinement, and residual enhancement on the fused memory features to better extract fine-grained disease cues in remote sensing scenes. Experiments on the Naro dataset show that SGM-RTDETR achieves 80.75% mAP@0.5 and 35.43% mAP@0.5:0.95, which is 2.74 percentage points higher than RT-DETR-L on mAP@0.5:0.95. Overall, the results indicate that the dual-module structure improves the precision and robustness of PWD detection in satellite remote sensing images.