Interpretable Machine Learning and Spatiotemporal Modeling of Meteorological and Environmental Drivers for Tuberculosis Incidence in China
Zihao Wang, Siyuan Li, Xiaotong Jiang, Kang Hu, Yangzhou WuTuberculosis (TB) remains a major public health burden in China. Although meteorological and environmental factors are recognized to influence TB transmission, their non-linear effects and spatiotemporal heterogeneity have not been fully elucidated. Based on monthly TB incidence data from 31 provinces in China during 2005–2020, this study systematically investigated these effects by integrating nine meteorological and air pollution variables within a combined machine learning and spatial statistical modeling framework. The results indicated that the Extreme Gradient Boosting (XGBoost) model effectively captured the complex non-linear relationships between environmental exposure and TB incidence. SHAP interpretability analysis identified surface pressure (SP), vegetation coverage, and PM2.5 as the key drivers and revealed pronounced nonlinear response patterns and threshold effects. In particular, the promoting effect of PM2.5 on TB incidence increased sharply at medium-to-high concentration levels. To further investigate spatial and temporal non-stationarity, Geographically and Temporally Weighted Regression (GTWR) was applied. The results demonstrated strong spatiotemporal heterogeneity in driver effects across provinces. The influence of PM2.5 showed a consistently positive association with TB incidence and exhibited a distinct temporal evolution characterized by an initial strengthening before 2015 followed by a weakening thereafter, closely aligning with China’s air pollution control process. These findings provide new insights into the nonlinear and spatiotemporally heterogeneous effects of meteorological and environmental factors on TB incidence and support the development of more targeted, region-specific TB prevention strategies.