DOI: 10.18466/cbayarfbe.1926951 ISSN: 1305-130X

A Compact and Explainable Machine Learning Pipeline for Low-Concentration Gas Sensor Array Classification

Bora Canbula
This study investigates whether compact and physically interpretable time-domain descriptors can support accurate low-concentration gas classification without relying on full multichannel waveforms. Using the Gas Sensor Array Low-Concentration dataset, each sample was transformed from raw sensor signals into 120 descriptors that summarize baseline behavior, variability, transient dynamics, and response magnitude. Four classical learning pipelines were evaluated through repeated stratified 5-fold cross-validation, supported by principal component analysis, descriptor ranking, sensor correlation analysis, and sensor subset experiments. The best-performing model, a radial support vector machine trained on the compact descriptor set, achieved a mean accuracy of 0.9456 and a macro-averaged F1 score of 0.9437 across 100 test folds while using 75 times fewer inputs than the raw waveform representation. An important finding is that classification performance improved systematically with concentration: accuracy was 0.9067 at 50 ppb, increased to 0.9333 at 100 ppb, and reached 0.9967 at 200 ppb. Response mean and baseline mean emerged as the most informative descriptor families, while VOCS-P, 2M012, and MQ-137 were the most discriminative sensors. Overall, the results show that compact and interpretable descriptors provide an efficient, reproducible, and practically useful benchmark for low-concentration gas classification in resource-constrained electronic nose systems.

More from our Archive