A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants
Shuang Gao, Yuntao Zou, Li FengIndustrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT devices. At its core, GM-DDQN introduces the gradient memory mechanism, a novel memory-efficient alternative to experience replay. This core innovation, combined with a simplified neural network architecture and efficient parameter quantization, collectively reduces memory requirements by 99% and computation time by 85–90% compared to standard methods. Experimental evaluations across three realistic simulated thermal power plant scenarios demonstrate that GM-DDQN improves energy efficiency by 42% compared to fixed policies and 27% compared to threshold-based approaches, extending battery lifetime from 8–9 months to 14–15 months while maintaining 96–97% PSR. The method enables sophisticated reinforcement learning directly on IIoT edge devices without requiring cloud connectivity, reducing maintenance costs and improving monitoring reliability in industrial environments.