False positive rate of modern ICMs according to guideline-defined annotation rules
N Varma, A Lazarus, A Menet, P Defaye, P Mabo, E CrespinAbstract
Introduction
Implantable Cardiac Monitors (ICMs) are essential for long-term arrhythmia detection, but their clinical utility is often hampered by a high volume of detected episodes, leading to a significant review burden. The clinical classification of these device-detected episodes is not well characterized across different manufacturers using a standardized methodology.
Purpose
This study aims to quantify the proportion of non-actionable episodes from modern ICMs by applying an ESC and ACC/AHA/ACCCP/HRS guideline-based annotation framework to a large, real-world dataset.
Methods
A retrospective analysis was conducted on 2,659 episodes from 1,710 patients (age 67.5 ± 15.0, male 27%, US 62.2%) implanted with ICMs from the four major manufacturers (Medtronic (43.6%), Biotronik (22.4%), Abbott (18.0%), and Boston Scientific (15.9%)). Devices with proprietary AI algorithms, were analyzed as a distinct cohort. To ensure a diverse dataset, the selection method included at most one episode per device for each distinct event type. All episodes were independently adjudicated by an independent expert committee using apre-specified annotation schema (Figure). Based on this schema, episodes were classified as: (1) clinically actionable, (2) non-actionable (defined as ‘Normal Rhythm’, ‘PVC’, ‘PAC or short atrial arrhythmias’), or (3) indeterminate (defined as 'Uninterpretable' or 'Uncertain SVT'). Episode classifications were independent of the original device diagnosis
Results
Adjudication revealed that a substantial proportion of device-detected episodes were non-actionable or indeterminate. In ICMs without proprietary AI algorithms (81.0%), 45.4% of all episodes were adjudicated as non-actionable and 20.1% as indeterminate. In the AI-equipped models (19.0%), despite improved classification, 32.9% of episodes were classified as non-actionable and 30.6% as indeterminate. Device-labeled 'Pause' episodes were a primary driver of these false positives, with 46.8% of them being adjudicated as false positives due to undersensing. These findings demonstrate that a high burden of non-actionable or difficult-to-interpret episodes persists, irrespective of current in-device pre-filtering algorithms.
Conclusion
When evaluated against a standardized, guideline-based methodology, modern ICMs generate a high burden of non-actionable and indeterminate episodes. This issue persists even in devices equipped with manufacturer-proprietary AI filters. Our findings highlight a critical unmet need for more advanced, universally applicable analysis tools to improve the signal-to-noise ratio in remote monitoring, reduce clinician workload, and focus attention on clinically significant events.