Kernel-Independent Component Analysis for Near-Infrared Spectroscopic Prediction of Tannin Content in Sorghum Grains
Wen-Peng Luo, Yue He, Yu Wei, Zheng-Guang Chen, Bing LiTo eliminate the complex nonlinear mixing relationships among spectral features in near-infrared (NIR) quantitative analysis, and to overcome the limitations of principal component analysis (PCA), which relies solely on covariance structure and linear assumptions and is therefore incapable of effectively handling nonlinear signals, this study employs kernel-independent component analysis (KICA), for nonlinear feature extraction from NIR spectra, combined with a regression model to achieve rapid detection of tannin content in sorghum grains. KICA effectively separates nonlinearly mixed source signals by mapping spectral data into a high-dimensional feature space via the kernel trick. The prediction model built on KICA-extracted features and support vector regression (SVR) consistently delivered the highest test-set prediction accuracy and exhibited the smallest training-to-test R2 gap among all evaluated models across repeated random splits, confirming its superiority over PCA-based feature extraction methods and standalone SVR, and its competitive performance relative to ICA-based methods, in both predictive accuracy and generalization capability. Additionally, KICA yielded a lower reconstruction error for the original spectra, indicating its ability to more completely retain the nonlinear informative content of the spectral data. By calculating the mean absolute coefficient of each independent component, it was found that the component with the highest contribution was strongly correlated with the wavelength range near the characteristic absorption peaks of tannin, thereby enhancing the chemical interpretability of the features. On a publicly available corn NIR dataset, the proposed method also achieved superior prediction results compared with benchmark methods, validating its generalization capability across different sample types and quality attributes. This study confirms the feasibility of introducing nonlinear blind source separation via KICA into NIR quantitative analysis, offering a promising approach for spectral feature extraction in the rapid quality assessment of agricultural products with complex matrices.