Proximal Policy Optimization in 5G, B5G, and 6G Communication Systems: A Systematic Review
Vijaya Kittu Manda, Bhukya Madhu, Theodore TarnanidisFifth-generation (5G), Beyond 5G (B5G), and sixth-generation (6G) wireless networks, along with the Internet of Things (IoT), are core communication infrastructure in smart cities. Their increased deployments create high-dimensional optimization and resource management challenges. Consequently, researchers have increasingly explored the use of Artificial Intelligence (AI) models for optimizing networks. The Proximal Policy Optimization (PPO) is one such algorithm that optimizes networks. This Systematic Literature Review (SLR) follows the PRISMA 2020 protocol to review 76 studies published between 2023 and 2026 to synthesize recent PPO-based approaches to optimize communication systems. This study examines key PPO variants in major communication domains. It outlines the primary obstacles to real-world deployment and provides a cross-domain classification. According to this study, PPO provides continuous action spaces with good training stability for AI models. Its stable policy-learning capabilities make it suitable for next-generation communication systems. However, sim-to-real transfer, reward design, and multi-agent scalability are a few key challenges encountered. Future directions emphasize robust, deployable PPO frameworks for 6G, IoT, and internet architecture.