Metabolomic Classification of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome via Explainable Ensemble Learning and Pareto-Guided Feature Selection
Fatma Hilal Yagin, Yavuz Korkmaz, Cemil Colak, Sarah A. Alzakari, Amal K. Alkhalifa, Fahaid Al-Hashem, Mohammadreza AghaeiMyalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a debilitating multisystem illness characterised by post-exertional malaise, non-restorative sleep, and cognitive impairment, yet no objective diagnostic biomarkers have been established. Untargeted plasma metabolomics provides a broad view of the biochemical disturbances underlying ME/CFS; however, the high dimensionality of omics datasets and the limited interpretability of conventional classifiers nevertheless hinder translation into clinical practice. This study evaluates three ensemble classifiers—Explainable Boosting Machine (EBM), XGBoost, and LightGBM—for binary ME/CFS classification using plasma metabolomic and lipidomic profiles from 197 participants (106 ME/CFS; 91 healthy controls; 888 features). Feature dimensionality was reduced using a Pareto-Guided Recursive Neural Network (PRNN) pipeline. Model performance was assessed via 50-repeat stratified hold-out validation. EBM achieved the highest accuracy (0.909; 95% CI: 0.868–0.949) and area under the receiver operating characteristic curve (AUC: 0.940; 95% CI: 0.909–0.983), with XGBoost and LightGBM performing comparably. Interpretability analyses revealed that pairwise metabolite interaction terms—particularly proline & indole-3-lactate, tyrosine & N-acetylornithine, and maleic acid & arachidic acid—contributed the greatest discriminative signal. An ablation analysis comparing the full interaction-augmented EBM (AUC = 0.940) with a main-effects-only EBM (AUC = 0.882) confirmed that pairwise metabolite co-variation contributes additional discriminative value beyond individual metabolite levels, implicating amino acid catabolism, tryptophan–kynurenine pathway dysregulation, mitochondrial energy impairment, and lipid remodelling as central pathophysiological features. Global and instance-level explanations jointly demonstrated population-level metabolic signatures alongside individual heterogeneity, highlighting the added clinical value of explainable artificial intelligence (XAI) in metabolomics. These findings support EBM-based metabolomic profiling as an internally validated approach for ME/CFS classification, subject to external validation, calibration assessment, and prospective testing.