DOI: 10.3390/metrology6030044 ISSN: 2673-8244

An Uncertainty-Aware Kernel-Based Method for Regression: The Generalized Least Squares Support Vector Machine

Alberto Bottacin, Francesca R. Pennecchi

A robust evaluation of predictive uncertainty is essential for deploying machine learning models in high-risk sectors. While various techniques such as Gaussian Processes and Bayesian Neural Networks have been considered to address model uncertainty, the measurement uncertainty associated with input data, particularly regarding heteroscedasticity and autocorrelation, is often overlooked. This work introduces the Generalized Least Squares Support Vector Machine (GLS-SVM), a kernel-based regression model designed to integrate the full variance–covariance matrix of the response variable into the training process. A GUM-consistent methodology was developed for evaluating prediction uncertainty, including a correction for model bias. The model’s performance was validated against standard Least Squares Support Vector Machines (LS-SVMs) and Gaussian Processes (GPs) through two case studies: a simulated regression problem with correlated data and the calibration of a mass flow controller. Performance was quantified using a comparability index (Cindex), defined as the absolute error of the prediction weighted by its expanded uncertainty. Results demonstrated that, in the simulated case study, the GLS-SVM achieved a Cindex consistently below 0.65, indicating that its predictions are statistically consistent with the ground truth. In contrast, competing models significantly exceeded unity, with peak values near 8, indicating a failure to provide physically consistent estimations. For the calibration of the mass flow controller, the GP models produced uncertainties one to three orders of magnitude smaller than the measurement uncertainties, whereas GLS-SVM yielded uncertainties that were more physically consistent with the underlying measurement process. Eventually, the proposed approach offers a versatile, metrologically informed framework for data-driven regression tasks where measurement covariance information is available and rigorous uncertainty quantification is required.

More from our Archive