DOI: 10.1097/aln.0000000000004971 ISSN: 0003-3022

A Comparison of 5 Algorithmic Methods and Machine Learning Pattern Recognition for Artifact Detection in Electronic Records of 5 Different Vital Signs: A Retrospective Analysis

Mathias Maleczek, Daniel Laxar, Lorenz Kapral, Melanie Kuhrn, Yannic Abulez, Christoph Dibiasi, Oliver Kimberger
  • Anesthesiology and Pain Medicine


Research on electronic health record physiological data is common, invariably including artifacts. Traditionally, these artifacts have been handled using simple filter techniques. The authors hypothesized different artifact detection algorithms, including machine learning, may be necessary to provide optimal performance for various vital signs and clinical contexts.

Materials and Methods

In a retrospective single center study, intraoperative OR and ICU electronic health record datasets including heart rate, oxygen saturation, blood pressure, temperature, and capnometry were included. All records were screened for artifacts by at least two human experts. Classical artifact detection methods (cutoff, multiples of standard deviation (z-value), interquartile range, and local outlier factor) and a supervised learning model implementing long short-term memory neural networks were tested for each vital sign against the human expert reference dataset. For each artifact detection algorithm, sensitivity and specificity were calculated.


A total of 106 (53 operating room and 53 ICU) patients were randomly selected, resulting in 392,808 data points. Human experts annotated 5,167 (1.3%) data points as artifacts. The artifact detection algorithms demonstrated large variations in performance. The specificity was above 90% for all detection methods and all vital signs. The neural network showed significantly higher sensitivities than the classic methods for: heart rate (ICU: 33.6%, 95% CI: 33.1–44.6), systolic invasive blood pressure (both in the OR (62.2%, 95% CI: 57.5–71.9) and ICU (60.7%, 95% CI: 57.3–71.8), and temperature in the OR (76.1%, 95% CI: 63.6–89.7). The confidence intervals for specificity overlapped for all methods. Generally, sensitivity was low, with only the z-value for oxygen saturation in the operating room reaching 88.9%. All other sensitivities were less than 80%.


No single artifact detection method consistently performed well across different vital signs and clinical settings. Neural networks may be a promising artifact detection method for specific vital signs.

More from our Archive