Country-Level Crop Yield Sensitivity to Climate Variability: An Interpretable Machine Learning Framework for Screening and Policy Prioritization
Rajiv Kumar Gill, Aldrin Manon, Navdeep Kumar Chopra, Sanjeev GillClimate change threatens global food security through shifts in temperature and precipitation regimes, greater frequency of extreme weather events, and cascading effects on agricultural input markets and institutional capacity. Policymakers require nationally comparable diagnostic tools that are reproducible, transparent, and grounded in open data. This study presents an interpretable machine learning framework for country-level crop yield prediction and climate sensitivity screening, using publicly available FAOSTAT-derived panel data spanning 101 countries and 10 staple crop types over 1990–2013. A gradient-boosted decision tree model (XGBoost 2.1.4) is trained on observations from 1990 to 2008 and evaluated on a strictly held-out temporal window (2009–2013), using annual mean temperature, annual precipitation, total pesticide use (as a rough proxy for agricultural input management intensity), crop type and country identifiers, and temporally lagged yield values as predictive features. The optimized model yields high predictive accuracy on held-out data (R2 = 0.982; RMSE = 11,183 hg/ha; MAE = 4396 hg/ha). Ablation analysis reveals that model performance depends primarily on temporal yield persistence and crop identity, with performance declining to R2 = 0.940 when lagged features are omitted, while climate-only variables explain limited variation (R2 = 0.119). Notably, an ordinary least squares (OLS) baseline achieves comparable performance (R2 = 0.984), suggesting that the dominant predictive signals arise from a stable temporal–cross-sectional structure rather than nonlinear modeling flexibility. SHAP-based feature attribution identifies regime-dependent temperature effects, with larger (more negative) marginal contributions under high-temperature conditions. Stylized sensitivity perturbations (+1 °C, +2 °C, −20% pesticide inputs) indicate modest mean yield changes (−1.3% to −1.6%) but substantial cross-national heterogeneity. Systematic residual analysis identifies countries exhibiting consistent over- or under-prediction patterns, offering diagnostic signals for further institutional investigation. This framework is designed as a transparent, scalable screening tool for evidence-based prioritization rather than a validated causal or forecasting instrument. It complements localized agronomic expertise and supports SDG 2 (Zero Hunger) and SDG 13 (Climate Action).