MFD-DF: A PM2.5 Concentration Prediction Method Based on Multimodal Feature Decomposition and Dynamic Fusion
Chen Song, Quanbo Long, Zhaobo Su, Yanchao Jiang, Li Wan, Xiankun Zhang, Tiantian Lv, Wenhu Hao, Zuxuan ShiAccurate air pollutant concentration prediction is crucial for public health and sustainable urban development. Existing methods predominantly rely on single-modal data, resulting in inadequate representation of pollutant spatiotemporal evolution, poor prediction accuracy, and limited generalization capabilities. To address these challenges, this research proposes a novel PM2.5 prediction framework termed MFD-DF that integrates ground-station time series and satellite remote sensing images. In feature extraction, learnable decomposition and deformable convolution are introduced, and a Cross-Modal Slot Attention module explicitly decomposes features to resolve information blurring. Subsequently, a dynamic cross-modal alignment mechanism is designed alongside a learnable Time-Expansion Network (TEN) to ensure fine-grained interaction. Furthermore, a local-global attention feature fusion mechanism is proposed to optimize data integration efficacy. Experimental results demonstrate that in single-step PM2.5 prediction tasks, the proposed MFD-DF achieves significant improvements of approximately 10–20% in MAE, RMSE, and MAPE compared to state-of-the-art baselines. In multi-step PM2.5 prediction, it effectively alleviates the error accumulation problem in long-sequence forecasting, demonstrating superior robustness and accuracy.