DOI: 10.7717/peerj.21484 ISSN: 2167-8359

A retrospective study of differential prognostic factors in early-onset versus late-onset colorectal cancer: a comprehensive clinical and machine learning analysis

Lifang Huang, Runtao Zhong, Kangkang Li, Jinfeng Chen, Yunxian He, Yuanning Ye, Shuxian Chen, Qili Wei, Huizhen Mai, Yali Zhang, Zhiqing Wang

Background

The incidence of early-onset colorectal cancer (EO-CRC; age <50 years) has been increasing worldwide. This single-center retrospective study aimed to compare the clinical characteristics of EO-CRC and late-onset CRC (LO-CRC; age ≥ 50 years) and to identify age-specific prognostic factors for overall survival (OS).

Methods

A total of 1,148 CRC patients were retrospectively analyzed and categorized into EO-CRC ( n  = 247) and LO-CRC ( n  = 901) groups. Clinical characteristics were compared using the Mann–Whitney U test and Chi-square test. Prognostic factors associated with OS were identified using Least absolute shrinkage and selection operator (LASSO) Cox regression followed by multivariate Cox modeling. Model performance was evaluated using the C-index, calibration curves, and time-dependent Receiver Operating Characteristic (ROC) analysis. Variable importance was further validated using a random survival forest (RSF) model.

Results

EO-CRC patients showed higher proportions of family history, concurrent polyps, and Programmed Cell Death Ligand 1 (PD-L1) expression >10%, whereas LO-CRC patients exhibited higher rates of hypertension, diabetes, and elevated carcinoembryonic antigen (CEA) levels. Although OS did not differ significantly between groups ( P  = 0.460), their prognostic determinants varied markedly. In EO-CRC, distant metastasis, family history, Tumor, Node, and Metastasis (TNM) stage, PMS1 homolog 2, mismatch repair system component (PMS2), MutS Homolog 6 (MSH6), tumor size, concurrent polyps, and Ki-67 were major predictors. In LO-CRC, age, BRAF gene V600E mutation (BRAF V600E) mutation, elevated Carbohydrate antigen 19-9 (CA19-9), Ki-67, low hemoglobin, vascular invasion, MutL Homolog 1 (MLH1), and pathological type were significant contributors. The C-index values for the EO-CRC and LO-CRC models were 0.829 (SE = 0.023) and 0.751 (SE = 0.018), respectively, and all time-dependent ROC curves demonstrated Area Under the Curve (AUCs) above 0.70, indicating good predictive performance. RSF analyses further confirmed that distant metastasis and family history as the strongest predictors, while age and BRAF V600E are the strongest predictors for LO-CRC.

Conclusion

This study suggests that EO-CRC and LO-CRC have fundamentally different prognostic determinants: the former emphasizes genetic susceptibility and tumor invasiveness, indicating that this group of patients may benefit from early genetic counseling, MMR/MSI testing, and immune checkpoint inhibitor therapy. The latter highlights age, acquired molecular changes, and chronic systemic factors, supporting the inclusion of metabolic and geriatric assessments in routine tumor care.

More from our Archive