Multi-UAV Cooperative Hunting in Obstructed Environments via a Multi-Agent Proximal Policy Optimization with Curriculum Learning

doi:10.3390/s26123907

DOI: 10.3390/s26123907 ISSN: 1424-8220

Multi-UAV Cooperative Hunting in Obstructed Environments via a Multi-Agent Proximal Policy Optimization with Curriculum Learning

Longjie Zheng, Junlin Zhou, Haijun Peng, Bai Li, Xinwei Wang

With the increasing complexity of unmanned aerial vehicle (UAV) missions in complex obstacle environments, cooperative hunting of maneuvering ground targets by UAV swarms has become an important problem for multi-agent autonomous decision-making. This paper focuses on a simulated three-UAV hunting scenario in a two-dimensional obstructed environment, where UAVs must search for, approach, encircle, and continuously track a target while avoiding static obstacles under local observation. To address the problem of multi-UAV cooperative hunting of dynamic targets in complex obstacle environments, this paper proposes a curriculum learning (CL)-based Multi-Agent Proximal Policy Optimization algorithm, termed CL-MAPPO. Specifically, a three-stage progressive training curriculum is designed to overcome the challenges of low exploration efficiency, slow environmental adaptation, and difficult convergence of cooperative hunting policies faced by multi-agent deep reinforcement learning in hunting tasks, thereby gradually enhancing the cooperative hunting capability of UAVs in complex environments. Curriculum I employs fixed obstacles and a stationary target position to train the UAVs’ basic obstacle avoidance and target search abilities. Curriculum II introduces randomly generated obstacles and target positions to improve the UAVs’ adaptability to varying environments. Curriculum III further incorporates a dynamic target, prompting the UAVs to learn effective hunting strategies against maneuvering targets. The simulation experiment includes ablation experiments against MAPPO without curriculum learning and comparative simulations against MADDPG and MADQN, using reward convergence curves and trajectory visualizations to evaluate the training results. The results show that, under the same training episodes in the ablation experiment, CL-MAPPO reaches a higher and more stable reward level than vanilla MAPPO, indicating improved learning efficiency without increasing model complexity. In the comparative experiment, the CL-MAPPO algorithm achieved a higher success rate in cooperative hunting. These simulation experiments verify the effectiveness and superiority of the CL-MAPPO algorithm in multi-agent cooperative hunting tasks.

Outline

Multi-UAV Cooperative Hunting in Obstructed Environments via a Multi-Agent Proximal Policy Optimization with Curriculum Learning

More from our Archive