Hybrid Machine Learning and Data Assimilation for Street-Level NO2 and PM2.5 Prediction in Copenhagen, Denmark (2001–2018)
Jibran Khan, Rune Keller, Claus NordstrømStreet-level concentrations of nitrogen dioxide (NO2) and fine particulate matter (PM2.5) pose serious public health risks in European cities, yet accurate multi-year prediction at traffic-dominated sites remains challenging. This study applies XGBoost (XGB) and Random Forest (RF) to predict hourly NO2 and daily PM2.5 at two street monitoring sites in Copenhagen, Denmark, trained on 17 years of observational data and evaluated on two independent years. Three-dimensional variational assimilation (3D-Var) and the Extended Kalman Filter (EKF) are then applied as post-processing corrections to the ML predictions using co-located observations. XGB achieved RMSE values of 9.5 and 7.4 µg/m3 for HCAB and JGTV NO2, respectively, in the 2018 test year. Both DA methods improved substantially on the ML baseline, with 3D-Var reducing NO2 RMSE by up to 57% and spike event RMSE by up to 51%. EKF achieved near-complete elimination of systematic bias across all configurations. The framework is computationally lightweight and can be applied to any deterministic model prediction at a monitoring station, including outputs from physics- and chemistry-based dispersion models. Overall, the findings show a practical way to improve street-level air quality prediction, with direct relevance for operational forecasting and public health protection.