Multisource Satellite Data-Driven Machine Learning Approach for Rice Yield Prediction
Sudheer Kumar Tiwari, Vinay Kumar Srivastava, Sonam AgrawalEstimation of rice crop yield at the village level is essential because village is the Insurance Unit (IU) for rice crop in many regions in India, and timely and accurate yield information at this scale supports timely and transparent claim settlements for farmers and supports local agricultural planning. To achieve this, a multi-source satellite data-based machine learning approach was used to estimate rice yield at the village level using optical and SAR data, climatic data and land surface model-derived parameters in Kakinada of Andhra Pradesh, India. The predictor dataset included seasonal cumulative rainfall, seasonal Normalized Difference Vegetation Index (NDVI)-Max, seasonal NDVI-Mean, seasonal Land Surface Water Index (LSWI)-Max, seasonal LSWI-Mean, season total Fraction of Absorbed Photosynthetically Active Radiation (fAPAR) and season total Root Zone Soil Moisture (RZSM), and season total backscatter of the Sentinel-1 VH polarization were used to represent crop greenness, moisture status, photosynthetic activity, soil water availability, canopy structure, and seasonal water supply. For model development and validation, village-level rice yield data from 2017 to 2023 was used, which was collected through Crop Cutting Experiment (CCE) at the maturity stage of Kharif season. In this study, four machine learning models such as Random Forest (RF), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), and Gradient Boosting (GB) were evaluated. The multi-source satellite data and yield data for the period 2017–2021 were used to train the models, which were independently tested on 2022 data and then applied to predict the rice yield in 2023. Leave-One-Year-Out (LOYO) cross-validation was also conducted on the 2017–2022 data to assess temporal robustness and generalization capability across years. Among the evaluated models, Random Forest exhibited the best overall performance. For the independent test year 2022, RF achieved an R2 of 0.465, RMSE of 415.34 kg ha−1, MAE of 322.22 kg ha−1, and MAPE of 10.36%. For the prediction year 2023, RF achieved improved accuracy with an R2 of 0.838, RMSE of 325.75 kg ha−1, MAE of 262.21 kg ha−1, and MAPE of 7.68%. Further, LOYO cross-validation also showed the robustness of RF, achieving the highest mean R2 of 0.702 and mean RMSE of 384.73 kg ha−1. The results illustrate that multi-source satellite data combined with machine learning can be a reliable and operationally useful tool in predicting village-level rice yield, which can be used for crop insurance claim settlement.