Deep Reinforcement Learning-Based Adaptive Protocol Optimization for Heterogeneous IoT Networks in 5G-Enabled Smart Cities
Saddam K. Alwane, Shereen S. Jumaa, Muna H. Saleh, Aymen D. Salman, Ayad Q. Al-Dujaili, Amjad J. HumaidiThe rapid proliferation of Internet of Things (IoT) devices within 5G-enabled smart city environments has introduced unprecedented challenges in communication protocol management across heterogeneous network architectures. With connected IoT devices projected to reach 21.1 billion by the end of 2025 and approximately 39 billion by 2030, existing static protocol selection mechanisms are unable to accommodate the dynamic Quality of Service (QoS) requirements of different smart city applications, such as enhanced Mobile Broadband (eMBB), Ultra-Reliable Low-Latency Communication (URLLC), and massive Machine-Type Communication (mMTC). This paper presents APO-DRL (Adaptive Protocol Optimization using Deep Reinforcement Learning), a framework that utilizes a Dueling Double Deep Q-Network (D3QN) combined with a Prioritized Experience Replay mechanism for intelligent, real-time communication protocol selection and parameter optimization in heterogeneous IoT networks. The proposed framework formulates the protocol optimization problem as a Markov Decision Process (MDP), wherein the DRL agent dynamically selects the optimal communication protocol (NB-IoT, LTE-M, LTE Cat-1, or 5G NR) and adaptively tunes transmission parameters based on real-time network conditions. Experimental evaluation in a 3GPP TR 38.901 Urban Macro simulation environment with N = 30 devices demonstrates that APO-DRL achieves a 138.9% improvement in average throughput compared to Static Allocation (60.00 vs. 25.12 Mbps), while simultaneously achieving the highest QoS satisfaction (83.38%) across all methods, albeit with higher energy consumption and packet loss than Static Allocation. Relative to D3QN+PER, APO-DRL exhibits substantially lower cross-seed throughput variance (±0.88 vs. ±11.03 Mbps), confirming that QA-PER produces a more stable and reproducible learned policy.