Correlation Is Not Prediction: Reassessing Predictive MRI Evidence in Guidelines for Persons With Relapsing-Remitting Multiple Sclerosis
Dulat Minas, Stefan Buchka, Joachim Havla, Ulrich MansmannBackground
Effective treatment monitoring and treatment decisions in relapsing-remitting multiple sclerosis (RRMS) require accurate and individualized prediction of future disease courses. Guidelines from the Magnetic Resonance Imaging in Multiple Sclerosis (MAGNIMS) group and the Canadian Multiple Sclerosis Working Group (CMSWG) frequently cite MRI outcomes as predictive, but the methodological quality of this evidence is uncertain.
Objectives
This study aims to critically assess the methodological standards underlying predictive claims about MRI outcomes in four major relevant MS guidelines.
Design
We conducted a content review of citations in the MAGNIMS 2015 and 2021 and the CMSWG 2013 and 2020 guideline publications.
Methods
Each source was evaluated for whether it reported quantitative predictive evidence: either predictive values with confidence intervals, Kaplan–Meier–based risk estimates, or externally validated models that provide accurate risk estimates (good calibration) and correctly separate high- from low-risk patients (good discrimination); We also checked if measures such as correlations, odds ratios, hazard ratios, Prentice criteria, or likelihood ratio tests were used.
Results
Across all four guidelines, most predictive statements relied on secondary citations and association-based measures. Odds ratios, hazard ratios, correlations, or Prentice criteria were commonly reported. Some studies reported predictive values, but confidence intervals were frequently not provided. Only isolated examples of properly validated prediction models were cited, and only one had undergone full external validation. Advanced methods, such as the likelihood reduction factor, were absent.
Conclusion
Current guideline statements on MRI prediction in RRMS often rely on associations rather than validated individualized predictions. They do not quantify individual risk or provide evidence for accuracy, calibration, discrimination, or robustness (reliability of predictions across different patients and settings). To ensure trustworthy and actionable evidence, future guidelines should require prospective risk estimates with confidence intervals, externally validated models with calibration and discrimination, predefined thresholds for predictive usefulness, and evaluation of clinical utility (e.g., decision curve analysis).