Analysis of the Accuracy and Inter-Reader Precision of Scar Quantification Techniques in Aortic Stenosis: A Comparative Cardiovascular Magnetic Resonance Imaging Study
Megan Rian Rajah, Pieter-Paul Strauss Robbertse, Vishesh Sood, Tonya Marianne Esterhuizen, Anton Frans Doubell, Philip George HerbstBackground: An important determinant of mortality in AS is the presence and quantity of myocardial scar. Scar quantification using late gadolinium enhancement (LGE) on cardiovascular magnetic resonance (CMR) imaging may be a useful risk-stratification tool for at-risk patients who do not meet current criteria for valve intervention. The incorporation of this tool into clinical practice is currently limited by a lack of consensus on the best LGE quantification technique to use. Methods: Fifteen patients with severe AS underwent LGE imaging on CMR. A reference estimate of the LGE mass was made using a semi-automatic quantitative visual method. An intensity slider (reporting software provided) was used to mark areas of enhanced signal in each short-axis slice that correlated with the reader’s visual assessment of LGE, which used predetermined imaging criteria. This visual slider method (VslM) of determining LGE mass was then used as a reference for establishing the accuracy of various semi- and fully automated methods for identifying and quantifying LGE burden. These included the signal threshold versus reference mean (STRM) method at thresholds of two-, three-, and five-standard deviations (2SD, 3SD, 5SD, respectively), the full width at half maximum (FWHM) method and the Otsu auto threshold (OAT) method. An intraclass correlation analysis was performed to establish and compare the inter-reader reliability for each method. Results: Three readers demonstrated 100% agreement on the presence of LGE in 12/15 (80%) of study cases. Accuracy determined by the Wilcoxon rank sum, Spearman correlation and Bland–Altman tests suggested that the 5SD method using remote myocardium reference regions of interest only in slices with visually detected LGE was best (Wilcoxon rank sum p-values ranged from 0.3 to 0.5 for the three readers, bias on Bland–Altman was <0.5 g for all three readers). This was followed by the FWHM method, but with wide minimum-maximum ranges observed. Inter-reader reliability was best for the 2SD STRM method (ICC = 0.9, p < 0.001), but accuracy using this method was clinically unacceptable. Inter-reader reliability was statistically acceptable for the VslM (ICC = 0.7, p < 0.001). The FWHM method yielded the best balance between accuracy and reliability but may be limited by the heterogeneity of scars observed from patient to patient. Conclusions: The FWHM appeared to offer a reasonable balance between accuracy and precision. However, it was not always the best fit, e.g., in patients with small or non-bright scars. There may be no additional benefit to using the semi- and fully automated methods in the context of AS, and visual estimation, when performed in the manner described in this study (i.e., the VslM), may be clinically sufficient.