SCAPeSCLC: An Integrated Spatial Transcriptomic and Bayesian Pathway Enrichment Dataset for Survival Modeling in Extensive-Stage Small Cell Lung Cancer
Milad ShirvalilooSmall cell lung cancer (SCLC) is an aggressive neuroendocrine malignancy with limited publicly available spatial transcriptomic resources, particularly for extensive-stage disease (ES-SCLC), which remains absent from major initiatives such as The Cancer Genome Atlas (TCGA). To improve accessibility, interoperability, and downstream analytical utility of existing spatial transcriptomic data, SCAPeSCLC was developed as a harmonized dataset derived from two publicly available Gene Expression Omnibus (GEO) series, GSE261345 and GSE261348, generated using the NanoString GeoMx Digital Spatial Profiler platform. The resource integrates normalized expression measurements from 296 tumor regions of interest (ROI) across 58 ES-SCLC patients treated with first-line chemoimmunotherapy. Normalized expression matrices were reformatted into survival-ready column-based datasets at both ROI and patient levels following log2-transformation and standardization. Clinical metadata were curated and harmonized, and progression-free survival (PFS), disease-specific survival (DSS), overall survival (OS), time-on-treatment (ToT), follow-up intervals, and censoring indicators were reconstructed from the original clinical records. Biological pathway (BP) activity scores were generated using Cancer Transcriptome Atlas (CTA) annotations encompassing 106 BPs. To account for variable ROI sampling across patients, Bayesian hierarchical modeling was applied to estimate patient-level pathway activity, yielding posterior estimates and corresponding credible intervals. The resulting resource includes harmonized expression matrices, pathway enrichment profiles, Bayesian posterior estimates, survival-ready clinical annotations, and standardized Cox proportional hazards modeling outputs, along with a dedicated GitHub repository. SCAPeSCLC is intended to facilitate confirmatory analyses, integrative statistical modeling, methodological benchmarking, and reproducible exploration of spatial transcriptomic determinants of survival in ES-SCLC.