Machine learning algorithms to accelerate etiological diagnosis of congenital disorders of adrenal steroidogenesis
Busra Gurpinar Tosun, Atam Noyan Ercetin, Serap Turan, Abdullah Bereket, Kazim Yalcin Arga, Tulay GuranAbstract
Background
Early and accurate etiological diagnosis of congenital disorders of adrenal steroidogenesis (CDAS) is critical as timely targeted management can prevent life-threatening complications and improve long-term outcomes.
Objective
To develop and validate a machine learning-assisted decision tree model for classifying CDAS using plasma steroid hormone profiles quantified by liquid chromatography–mass spectrometry (LC–MS/MS).
Methods
A development cohort of 1027 participants (325 genetically confirmed CDAS patients representing 8 subtypes/702 controls) was used for model construction. The Light Gradient Boosting Machine algorithm identified key discriminatory steroid hormones, which were integrated into an optimized decision-tree classifier. Internal performance was assessed through 5-fold cross-validation. The performance of the model was further evaluated using a validation cohort comprising 507 independent LC–MS/MS steroid profiles. Additional analyses included Shapley additive explanations (SHAP), confusion matrix visualization, principal component analysis (PCA), and Uniform Manifold Approximation and Projection (UMAP).
Results
In the development cohort, the model achieved a mean overall accuracy of 97.1%, sensitivity of 99.5%, and specificity of 93.7%, with a macro-AUC (area under curve) of 0.97. Subtype-level accuracy exceeded 98% for most major CDAS subtypes. In the validation cohort, overall accuracy was 98.9%, sensitivity 93.6%, specificity 99.8%. Feature importance analysis and SHAP identified 11-deoxycortisol, 17-hydroxyprogesterone, 21-deoxycortisol, and corticosterone as the strongest discriminators. Principal component analysis and UMAP revealed distinct clustering of CDAS subtypes, confirming the biological coherence of model predictions.
Conclusion
Machine learning–assisted steroid profiling provides an accurate and highly interpretable diagnostic approach for CDAS, with potential for integration into pediatric endocrine diagnostics and decision-support systems.