DOI: 10.25259/jcis_283_2025 ISSN: 2156-5597

Construction of an ultrasound feature-based diagnostic model for predicting triple-negative breast cancer using 108 machine learning algorithm combinations

Xin Zhang, Xiaomin Zhang, Shan Zhu, Ningxia Ao, Yali Cao, Guangfei Yang

Objectives:

Triple-negative breast cancer (TNBC) is an aggressive subtype of breast cancer lacking estrogen receptor, progesterone receptor, and human epidermal growth factor R expression. This study aimed to identify ultrasound features associated with TNBC and construct a robust diagnostic model using machine learning techniques.

Material and Methods:

A total of 433 BC patients (47 TNBC and 386 non-TNBC) with complete clinical and ultrasound data were retrospectively analyzed. Univariate and multivariate logistic regression analyses identified independent ultrasound predictors of TNBC. Subsequently, 108 combinations of feature selection methods and machine learning classifiers were systematically evaluated to identify the optimal diagnostic model. Model performance was assessed using receiver operating characteristic curves, decision curve analysis (DCA), and calibration plots in both training and validation cohorts. The ethical approval was obtained from the institutional review board, and the requirement for individual written informed consent was waived due to the retrospective nature of the study.

Results:

Multivariate analysis identified posterior acho, margins, calcification, and aspect ratio as independent predictors of TNBC ( p < 0.05). Among all model combinations, the Stepwise Generalized Linear Model combined with Random Forest model achieved the best performance, with an area under the curve (AUC) of 1.000 in the training set and 0.913 in the validation set. Subgroup analysis revealed higher model accuracy in patients aged ≤50 years (AUC = 0.911) compared to those >50 years (AUC = 0.706). DCA and calibration plots indicated high clinical utility and excellent calibration performance.

Conclusion:

Specific ultrasound features can effectively distinguish TNBC from non-TNBC. A machine learning-based diagnostic model that integrates multiple algorithmic combinations demonstrates strong generalizability and may serve as a valuable clinical tool, particularly for younger patients.

More from our Archive