DOI: 10.3390/sym18071136 ISSN: 2073-8994

A Hybrid Machine Learning Method for Secure Assessment of NAND Flash Health and SSD Data Recovery Feasibility

Leila Rzayeva, Aliya Zhetpisbayeva, Murat Zhakenov, Altynbay Abdykassym

NAND flash-based solid state drives (SSDs) are increasingly common in computers, but they present a problem for forensic data recovery. SSDs use controller logic, flash translation layers, error correction, wear leveling, TRIM, garbage collection, and encryption to influence the recoverability of data after being written or erased, which is not the case for hard disk drives (HDDs). In this paper, we propose a machine learning-based method to determine the health of NAND SSDs and their data recoverability. The approach involves telemetry (SMART and NVMe) analysis, subsystems’ interpretation of NAND and controller health, and anomaly detection with the Isolation Forest machine learning algorithm. The task is formulated as a single-class learning problem that takes into account asymmetry, where telemetry from a healthy SSD represents the reference state and NAND degradation, controller instability, TRIM effects, and encryption-related limitations act as asymmetric deviations from this state. The presented method uses telemetry data, such as the temperature, wear level, spare blocks, media and data integrity errors, error logs, unsafe shutdowns, and uptime. This study shows that the potential for data recovery depends on the health of the NAND flash memory and controller, TRIM, encryption, and other anomalies but not necessarily any single SMART metric. The proposed approach provides explainable, data recovery-focused assessment and categorizes the SSD cases as recoverable, partially recoverable, and non-recoverable. The model was trained using a healthy SSD dataset consisting of 56,482 SATA SSD records and 82,665 NVMe SSD records, for a total of 139,147 healthy drive samples. Additionally, 20,000 synthetic training samples were generated for each SSD type to support controlled model training. The proposed platform was evaluated using 30 SSD recovery scenarios, including recovery, partial recovery, and no recovery cases. The results demonstrate that the proposed method can distinguish between healthy, warning, and abnormal SSD states and provide recovery recommendations based on NAND health, controller stability, TRIM status, and encryption limitations.

More from our Archive