Underreporting of Interrater Agreement in Diagnostic Accuracy Studies of Immunohistochemistry in Dermatopathology: A Systematic Review
Margaryta Stoieva, Garth FragaAbstract:
Immunohistochemistry (IHC) is widely used in dermatopathology to support diagnostic classification, yet the reproducibility of IHC interpretation is rarely evaluated in studies of diagnostic performance. Because most IHC tests require visual interpretation by pathologists, variability in reader interpretation may influence reported test accuracy. We conducted a systematic review of diagnostic accuracy studies of dermatopathology IHC diagnostic biomarkers published between 2015 and 2025 to determine how frequently interobserver agreement is reported. Searches of PubMed, Embase, and Web of Science identified 1461 abstracts, of which 84 studies met inclusion criteria. Although most studies reported sensitivity and specificity, 73 of 84 (87%) did not report any measure of interobserver agreement. When reported, agreement was most commonly assessed using Cohen kappa statistic, although qualitative descriptions of concordance were also used. To integrate diagnostic accuracy with reproducibility, we explored a composite metric, the kappa-balanced score, defined as the geometric mean of balanced accuracy and Cohen kappa. Kappa-balanced score was calculable in three studies and ranged from 0.62 to 0.93. These findings indicate that interobserver agreement is substantially underreported in dermatopathology IHC studies, highlighting the need for routine reporting of reproducibility metrics in diagnostic accuracy research.