Air Pollution Prediction Based on Stacked Deep Autoencoder Network Model
Dhuha Saad Ismael, Nurulkamal Masseran, Sakhinah Abu BakarUrban air pollution, especially the problem of PM2.5, is one of the major health challenges facing the planet today. To provide accurate PM2.5 predictions despite data noise and missing data, the authors proposed a deep learning model. We constructed a Stacked Autoencoder–Convolutional Neural Network–Bidirectional Long Short-Term Memory–Long Short-Term Memory (SAE-CNN-BiLSTM-LSTM) model that (1) utilises convolutional layers to extract spatial features from the input data, (2) employs bidirectional LSTM layers to capture long-term temporal dependencies, and (3) utilises an autoencoder to learn latent representations of the data to mitigate the effects of missing data. The model was trained on a large dataset of hourly measurements of air quality and meteorological parameters collected between 2018 and 2020 in Klang, Malaysia. The performance of the model on data that were not used during training was evaluated using a range of metrics. The SAE-CNN-BiLSTM-LSTM model achieved a test RMSE of approximately 11.97 µg/m3 and an R2 statistic of approximately 0.85 for PM2.5 concentrations, outperforming the other models tested on the same datasets. The additional metrics of MAE, MAPE, Mean Bias Error, and Index of Agreement confirmed the model’s accuracy and low bias in the prediction of air pollution levels. Statistical tests, such as the Diebold–Mariano test, confirmed the significance of the model’s accuracy over the CNN-LSTM models. These findings indicate that the proposed model effectively captures the dynamics of the air pollution data. The proposed model structure efficiently achieved an accurate and lightweight model for urban air pollution forecasting.