Interpretable XAI Pipeline for Colorectal Cancer Survival Prognosis on a Calibrated Synthetic Multimodal Cohort: A Methodological Framework Demonstration with Four Named Artifacts, a Provenance–Stress Negative-Control Audit and a TCGA-COADREAD Extern
Iacovos Ioannou, G. S. Pradeep Ghantasala, Thrilok Kolla, Pellakuri Vidyullatha, Vasos VassiliouAn interpretable artificial intelligence pipeline is presented for the prognostic survival modeling of colorectal cancer (CRC) on a calibrated synthetic multimodal cohort, with a provenance–stress negative-control audit and a TCGA-COADREAD external clinical check. The study is positioned as a methodological framework demonstration rather than as direct clinical evidence. Four reusable artifacts are introduced: the Explainable Conformal Width Decomposition (ECWD), the Causal-ECWD over a CRC directed acyclic graph (DAG), the DAG-Robustness Sensitivity Index (DAG-RSI) and the Provenance–Stress Protocol with its three Provenance–Stress Index variants. Two applications to CRC prognosis are evaluated: conformalized survival prediction and causal SHAP under an assumed DAG. On the principal cohort (Neff=11,198, event prevalence 0.982), the highest AUC is attained by Logistic Regression (0.886±0.017), followed by Stacking (0.883±0.019) and Gradient Boosting (0.872±0.019). The conformal survival module attains 0.907 empirical coverage at the nominal 90% level, with a mean interval width of 0.394 years. The ECE of the Reference Random Forest is reduced by Venn-Abers calibration from 0.0241 to 0.0062. Amplification, deflation and stability regimes are exposed by causal SHAP under the assumed DAG. Near-chance discrimination (best AUC 0.502) is shown on the Kaggle cohort, supporting its use as a provenance–stress negative control, while external-check AUCs of 0.747 at three years and 0.753 at five years are obtained on TCGA-COADREAD. The pipeline is offered as a reproducible framework for uncertainty-aware and interpretable CRC prognosis, pending prospective external validation.