Advancing County-Level Potato Cultivation Area Extraction: A Novel Approach Utilizing Multi-Source Remote Sensing Imagery and the Shapley Additive Explanations–Sequential Forward Selection–Random Forest Model
Qiao Li, Xueliang Fu, Honghui Li, Hao ZhouPotato, a vital food and cash crop, necessitates precise identification and area estimation for effective planting planning, market regulation, and yield forecasting. However, extracting large-scale crop areas using satellite remote sensing is fraught with challenges, such as low spatial resolution, cloud interference, and revisit cycle limitations, impeding the creation of high-quality time–series datasets. In this study, we developed a high-resolution vegetation index time–series by calculating coordination coefficients and integrating reflectance data from Landsat-8, Landsat-9, and Sentinel-2 satellites. The vegetation index time–series were enhanced through using linear interpolation and Savitzky–Golay (S-G) filtering to reconstruct high-quality data. We employed the harmonic analysis of NDVI time–series (HANTS) method to extract features from the time–series and evaluated the classification accuracy across five feature sets: vegetation index time–series features, band means, vegetation index means, texture features, and color space features. The Random Forest (RF) model, utilizing the full feature set, emerged as the most accurate, achieving a precision rate of 0.97 and a kappa value of 0.94. We further refined the feature subset using the SHAP-SFS feature selection method, leading to the SHAP-SFS-RF classification approach for differentiating potato from non-potato crops. This approach enhanced accuracy by approximately 0.1 and kappa value by around 0.2 compared to the RF model, with the extracted areas closely aligning with statistical yearbook data. Our study successfully achieved the accurate extraction of potato planting areas at the county level, offering novel insights and methodologies for related research fields.