DOI: 10.1017/rsm.2026.10103 ISSN: 1759-2879
Mixed-effects two-stage residual inclusion methods for individual patient data meta-analysis: A methodological framework for causal inference in survival analysis
Heather Hufstedler, Alexander M. Danzer, Valentijn Marnix Theodoor de Jong, Till Bärnighausen Abstract
Individual patient data meta-analyses (IPDMAs) provide powerful tools for synthesizing evidence across studies, yet methods for addressing unmeasured confounding in observational IPDMAs with survival outcomes are rarely implemented. Instrumental variable (IV) approaches offer causal inference capabilities but face practical challenges in hierarchical data structures, particularly the lack of standard diagnostics for instrument strength in nonlinear mixed-effects models. We adapt and evaluate a frequentist mixed-effects two-stage residual inclusion (2SRI) framework for survival IPDMAs, extending traditional IV methods to accommodate study-level and temporal clustering while handling time-to-event outcomes through Cox proportional hazards models. Because classical
F
-statistics are unavailable for logistic mixed-effects first-stage models, we propose the Wald
χ
2
$\chi ^2$
chi squared
statistic as a practical instrument-strength diagnostic and empirically characterize its relationship to estimator performance. Through a comprehensive simulation study with 48 scenarios—varying unmeasured confounding (weak to very strong), instrument–treatment association strength (0.3–1.0), and cross-study IV allocation patterns—we evaluated 2SRI against naive mixed-effects Cox models using bias, coverage, variance, and mean squared error. The design was anchored to realistic IPDMA structure (10 studies,
N
≈
4
,
357
$N \approx 4,357$
upper N almost equals 4 comma 357
) from pooled Ebola data, with 1,000 replications per scenario. Results show that under weak confounding, naive models dominate on all metrics. With moderate-to-strong confounding and realized Wald
χ
2
$\chi ^2$
chi squared
exceeding 150–200, mixed-effects 2SRI substantially reduces bias and achieves near-nominal coverage, though with inflated variance. We provide empirical guideposts linking realized first-stage strength to expected performance, enabling analysts to judge when 2SRI will outperform conventional approaches in hierarchical survival IPDMAs. All simulations assume a common treatment effect across studies. Performance under heterogeneous effects remains to be established.