The molecular similarity landscape of preclinical cancer models to patient tumors
Zixuan Xie, Jia Xue, Binchen Mao, Hengyuan Liu, Wubin Qian, Jingjing Wang, Xiaobo Chen, Sheng GuoSelecting appropriate preclinical models is fundamental for translational oncology, yet a large-scale, multi-omic quantitative comparison of their similarity to primary human tumors is lacking. To address this, we integrated transcriptomic, proteomic, and genomic profiles from over 10,000 primary tumors from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC), alongside 4,000 preclinical models. Using a robust computational framework, we revealed a clear hierarchy of transcriptomic and proteomic similarity to patient tumors: patient-derived xenografts (PDXs) > patient-derived organoids (PDOs) = PDX-derived organoids (PDXOs) > cell lines. We also quantified high molecular conservation (Pearson correlation coefficient = 0.96) across paired in vitro to in vivo platform (organoids to PDX) transitions. Furthermore, genomic analysis demonstrated that whole-exome sequencing (WES) outperforms RNA sequencing (RNA-Seq) in detecting DNA variants, and it identified a clonal complexity hierarchy (cell lines > PDXOs > PDXs > PDOs) reflecting the impact of passaging history on intra-tumor heterogeneity. Ultimately, this study delivers a comprehensive quantitative benchmark, establishing a population-level hierarchy of molecular similarity between preclinical models and primary tumors, and providing a data-driven reference for model selection. These findings offer a data-driven framework for selecting models that balance biological representativeness with experimental practicality.