DOI: 10.3390/s26134099 ISSN: 1424-8220

Intrusion Detection Datasets for IIoT and ICS: A Taxonomic Review with a Decision-Aid Scoring Rubric

Ayman Termanini, Hadj Bourdoucen, Dawood Al-Abri, Ahmed Al Maashri

Dataset quality significantly affects the effectiveness of a machine learning (ML) model in an intrusion detection system (IDS) for cyber-physical industrial control systems (CPS/ICS) and Industrial Internet of Things (IIoT). Existing surveys compare datasets qualitatively or along limited dimensions, whereas this review introduces quantitative documentation and decision-aid scoring across 23 ICS/OT/IIoT datasets. These datasets are analyzed along seven measurable axes, with their attacks mapped to MITRE ATT&CK for ICS tactics. Quantitatively, 14 of the 23 datasets (60.9%) are built on physical testbeds, and 22 of the 23 map to MITRE ATT&CK for ICS, spanning 11 of the 12 tactics. We introduce a checklist for documentation completeness (0–7) and a decision-aid rubric (0–15) covering realism, attack diversity, class imbalance, documentation, and reproducibility. Protocol coverage across these datasets is skewed toward Modbus (13 of 23 datasets, 57%), while many other protocols (such as Profinet and OPC UA) are underrepresented relative to their industry deployment. The available datasets show structural gaps in capturing multi-stage adversary behavior. In practice, dataset selection should pair a realism-anchored dataset with a high-reproducibility one, and account for protocol diversity and APT representation.

More from our Archive