An integrative multi-omics machine learning framework for precision metastasis prediction and clinical staging in non-small cell lung cancer.
Jinbin Wang, Ling Yao, Keqin Gao, Zhen Lv, Qianya Wei, Xiping Xing, Ling Jin, Jianjun Wu, Dongjing Ma11
Background:
Traditional TNM staging inadequately captures the biological aggressiveness of NSCLC. While cell cycle dysregulation is a cancer hallmark, its role in driving invasiveness remains under-characterized. We developed a Lasso-Logistic machine learning (ML) framework to integrate cell cycle transcriptomics for enhanced metastasis and staging prediction.
Methods:
We integrated multi-omics data from TCGA, GEO (n=3), and CPTAC, along with five scRNA-seq datasets. A 14-gene signature was identified through Lasso-Logistic regression to calculate a CCRS. The biological interpretability of the findings was ensured by employing scRNA-seq pseudotime trajectory inference. The model was validated both in vitro using four cell lines and ex vivo through RT-qPCR on cDNA microarrays with 15 paired tissues, as well as in an independent clinical cohort.
Results:
The ML framework identified a 14-gene signature (notably CCNB1, CDK1, CCNA2) with superior discriminative power. In the discovery meta-cohort, the model achieved an AUC of 0.879 for metastasis prediction, maintaining a C-index of 0.740 in the TCGA. scRNA-seq analysis confirmed that the CCRS genes were significantly upregulated along the EMT axis (
Performance metrics of the multi-omics machine learning framework.