DOI: 10.1111/andr.70297 ISSN: 2047-2919

Machine Learning‐Based Prediction of Sperm Retrieval Outcomes in Patients With Klinefelter Syndrome: A Multicenter Study With External Validation

Murat Gül, Ali Şahin, Cevahir Özer, Mesut Altan, Tansu Güdelci, Kadir Can Sahin, Mehmet Hamza Gültekin, Muhammed Arif İbis, Mesut Berkan Duran, Burak Sivaslıoğlu, Eray Hasırcı, Mert Kılıç, Ali Can Albaz, Gökhan Çeker, Gianmaria Salvio

ABSTRACT

Background

The Klinefelter syndrome is a common genetic cause of male infertility, and testicular sperm extraction (TESE) enables sperm retrieval in a subset of affected patients. However, predicting TESE success remains challenging due to the heterogeneous clinical and endocrinological presentation of the Klinefelter syndrome.

Objectives

We aimed to develop and externally validate machine learning models to predict sperm retrieval outcomes in infertile patients with the Klinefelter syndrome using routinely available clinical, hormonal, and testicular parameters.

Methods

This multicenter retrospective study included 470 infertile patients with the Klinefelter syndrome who underwent TESE between January 2021 and November 2025. Data from 12 centers ( n = 307) constituted the internal dataset and were randomly split into training (80%) and test (20%) sets, while data from one independent center ( n = 163) were used for external validation. Five supervised ML algorithms—Random Forest, Decision Tree, AdaBoost, Gradient Boosting, and Extreme Gradient Boosting—were developed using repeated stratified five‐fold cross‐validation with five repeats. Model performance was evaluated using accuracy, F1 score, sensitivity, specificity, area under the receiver operating characteristic curve (AUROC), and Youden's J index. Model interpretability was assessed using Shapley additive explanations (SHAP).

Results

Across internal cross‐validation, all ensemble‐based models demonstrated high discriminative performance (mean AUROC > 0.95). During external validation, performance declined across models; however, the Random Forest model achieved the highest accuracy (0.83), AUROC (0.95), and Youden's J index (0.73), indicating superior generalizability. SHAP analysis identified follicle‐stimulating hormone as the most influential predictor, followed by total testosterone, luteinizing hormone, and bilateral testicular volume. Higher FSH and LH levels were associated with reduced sperm retrieval probability, whereas higher testosterone levels and larger testicular volumes increased the likelihood of successful retrieval.

Conclusions

Machine learning models can accurately and interpretable predict sperm retrieval outcomes in patients with the Klinefelter syndrome. Among the evaluated algorithms, Random Forest demonstrated the most robust and clinically balanced performance during external validation. Integration of ML‐based prediction tools may support individualized counseling and decision‐making in the management of infertility in the Klinefelter syndrome.

More from our Archive