DOI: 10.3390/telecom7040080 ISSN: 2673-4001

Deadline-Aware Scheduler-Weight Adaptation for 5G NR V2X Networks Using Probabilistic Prediction and Reinforcement Learning

Gerasimos Papanikolaou-Ntais, Dionysios N. Sotiropoulos, Athanasios Kanavos, Alexandros Kaloxylos

5G New Radio Vehicle-to-Everything (NR V2X) networks must support heterogeneous traffic with strict and diverse latency requirements. Conventional proportional-fair (PF) scheduling does not explicitly account for packet deadlines, which can lead to deadline violations for critical vehicular services under congestion. This paper studies deadline-aware MAC scheduler-weight adaptation for 5G NR V2X using probabilistic prediction and reinforcement learning. We implement a closed-loop ns-3/5G-LENA framework in which network telemetry is exchanged with a Python control agent through ns3-ai shared memory. Gaussian Mixture Model (GMM), Hidden Markov Model (HMM), and Bayesian Logistic Regression (BLR) classifiers are used to predict imminent deadline violations. Their outputs are either mapped directly to scheduler weights or provided as additional state information to a Proximal Policy Optimization (PPO) agent. We evaluate ten scheduling strategies: PF, a non-learning Slack-Based Deadline-Aware Scheduler (SB-DAS), three classifier-only controllers, three classifier-assisted PPO variants, PPO-only, and PPO-only with safety shielding. Experiments are conducted across three vehicle densities and three random seeds per density, using the Deadline-Constrained Packet Reception Ratio (DC-PRR) as the main metric. The PF baseline achieves 61.55% mean DC-PRR and degrades from 75.2% at 30 vehicles to 44.1% at 60 vehicles. In contrast, all adaptive strategies exceed 95% mean DC-PRR and recover 34–38 percentage points over PF in every paired density/seed comparison. The main result is therefore the robust gap between PF and deadline-aware adaptation. Differences among the adaptive controllers are much smaller and fall within the observed seed-to-seed variability. In particular, SB-DAS, which uses no classifier, neural network, or training, achieves DC-PRR statistically indistinguishable from the learned and probabilistic controllers. This indicates that, in the evaluated scenarios, most of the gain comes from deadline awareness itself rather than from learning. We also find that adding classifier-derived violation probabilities to PPO does not consistently improve performance over PPO using raw telemetry alone. To support reproducibility and deployment assessment, the paper includes detailed parameter tables, reward-coefficient and sensitivity analysis, scheduler-weight sensitivity, and per-controller inference-latency and complexity measurements.

More from our Archive