DOI: 10.3390/s26123920 ISSN: 1424-8220

Dynamic Time Warping for System-Level Fault Detection in IoT Devices: An Episode- and Layer-Based, Label-Free Approach

Ryan Aalund, Vincent P. Paglioni

IoT devices operate as integrated systems spanning hardware, firmware/software layers, and communication layers. In operational settings, many faults and performance degradations are emergent: they arise from cross-layer interactions, workload changes, and telemetry artifacts, rather than a single physics-of-failure mechanism. These realities make traditional supervised fault classification difficult because labeled fault data are rarely available during deployment, and the fault surface is unknown and a priori. This paper presents a practitioner-oriented, label-free fault detection and diagnosis (FDD) pattern based on Dynamic Time Warping (DTW) for rapid implementation in production IoT telemetry. The method represents a device as a sequence of overlapping episodes and organizes telemetry into interpretable layers (hardware sensors, communication health proxies, and software/firmware-derived KPIs). A reference library of regular episodes is built from an assumed-healthy training window; new episodes are scored using constrained DTW distances against this library, while retaining per-layer and per-channel contributions for attribution. We show that production performance depends strongly on operational parameterization, including episode length, DTW constraints, robust threshold learning, and temporal validation. Within a verified-healthy evaluation window, the tuned configuration achieves an AUROC of 0.97 for the temporally structured faults DTW is suited to (bias, drift, and interaction faults, with spikes detected at an AUROC of 0.93), detecting 100% of injected faults, with a mean delay under 25 min. We further show that constant-value (stuck-at) and missing-data (dropout) faults fall outside DTW’s shape-matching scope (AUROC about 0.66) and are better served by complementary variance- and missingness-based detectors, a consequence of DTW’s shape-matching scope rather than a parameter choice. This work contributes a system-level methodological framework for deploying DTW as an IoT fault-detection-and-diagnosis capability: an episode-and-layer architecture aligned with hardware, communication, and software/firmware ownership; a label-free reference library requiring only assumed-healthy data; per-layer and per-channel attribution for cross-domain triage; and a reproducible operational tuning procedure. Together, these deliver a fast-to-deploy, scalable, and accurate first-line detector for label-scarce IoT systems.

More from our Archive