CSD-Net: Content–Style Decoupling with Exploratory MLLM-Guided Refinement for Robust Change Detection
Bo Peng, Chenhao Zhang, Mingmin Chi, Wenbing Zhu, Yun ZhangRemote sensing change detection (RSCD) aims to produce pixel-accurate change maps from bi-temporal images yet is fundamentally challenged by radiometric pseudo-changes (season, illumination, and atmosphere) that cause structure–environment entanglement in deep features. We propose CSD-Net, a framework centered on content–style decoupling (CSD): a physics-inspired feature decomposition mechanism that encourages separation between intrinsic geometric content and extrinsic environmental style. In the CSD module, learnable pseudo-change tokens estimate a spatially invariant global style proxy through cross-attention and broadcast, and subtraction performs feature-level radiometric-bias compensation, yielding pseudo-change-robust content features for change prediction. CSD-Net (Base) alone achieves state-of-the-art performance across four benchmarks (LEVIR-CD, LEVIR-CD+, CDD, and WHU) with favorable accuracy–efficiency trade-off (14.49M parameters and 15.26G FLOPs). We further explore an optional extension, CSD-Net+, that employs an MLLM (Qwen2.5-3B, LoRA-tuned) as a semantic refiner and SAM for instance mask refinement, coupled with uncertainty-aware three-way softmax fusion. This exploratory Stage 2 brings modest but consistent IoU improvements of 0.45–2.20% at the cost of significant computational overhead and is designed for offline, quality-critical scenarios. We provide a comprehensive account of both the effectiveness and the limitations of the proposed approach, including the marginal benefit–cost ratio of foundation model integration.