Encoding Versus Linear Use of Patient Characteristics in Chest X-Ray Foundation Models on MIMIC-CXR
Yeonsu Kim, Yangwon Kim, Yoojin Nam, Namjoon Kim, Pa HongBackground: Chest X-ray (CXR) foundation models can predict patient demographic categories (sex, age, race) from images alone by linear probing, but whether encoded attributes drive finding prediction has not been tested at scale. Methods: On MIMIC-CXR (230,697 images, 60,518 patients), we measured attribute dependence (AUROC drop after residualizing an attribute from a frozen embedding) across 24 patient attributes (four demographics and 20 ICD-coded comorbidities), 10 thoracic findings, and 6 overlap-free foundation models (n=1440 triplets), with 3 additional CXR-pretrained models (RAD-DINO, CheXzero, CheSS) for encoding and fairness analyses. Dependence was regressed on attribute-finding odds ratios (ORs), encoding strength, and model-level factors. Results: Encoding and dependence dissociated. Sex (AUROC 0.942) contributed <0.001; race (0.83) contributed 0.0015 (rank 14/24); heart failure (0.774) showed the largest dependence (0.018). |log(OR)| explained 50.6% of dependence variance (β=0.029, p<10−15); model factors added no detectable contribution (ΔR2=0.000, n=6). Residualizing the top three high-OR attributes reduced AUROC by 0.026 without narrowing sex or age subgroup gaps (minimum detectable effect size (MDES) = 0.0019). Across 9 models, four-category race subgroup gaps (mean 0.069) were 30–75× larger than race residualization drops (mean 0.0015); CheXzero showed the same decoupling. Conclusions: Encoding, residualization-sensitive dependence, and subgroup bias are three separable quantities on the same model. Pre-deployment audits on inpatient-skewed cohorts can prioritize attributes by local OR; jointly residualizing race and its cardiac correlates does not narrow the race subgroup gap, which instead tracks group-wise finding base rates. Cross-institutional transfer remains open: no public CXR cohort currently links comorbidity electronic health records for external validation of the OR-dependence relationship.