Target-Aware Safety-Residual Reinforcement Learning for Cooperative Multi-UAV Pursuit in Complex Environments

doi:10.3390/machines14070733

DOI: 10.3390/machines14070733 ISSN: 2075-1702

Target-Aware Safety-Residual Reinforcement Learning for Cooperative Multi-UAV Pursuit in Complex Environments

Shun Li, Bo Yu, Dongying Liu, Dayu Gao, Peizheng He, Gongbo Chen, Lin Xu

Multi-UAV cooperative persistent tracking in complex obstacle environments requires agents to approach dynamic targets while ensuring obstacle avoidance and flight safety; however, standard multi-agent reinforcement learning (MARL) methods typically rely on a single policy to implicitly handle both objectives, making it difficult to balance task performance and risk control. To address this issue, this paper proposes a Target-Aware Safety-Residual Pursuit Reinforcement Learning (TASRP) framework for constrained three-dimensional environments. A continuous-control 3D tracking environment is constructed in IsaacLab, where two multirotor UAVs cooperatively track a dynamic target under random, target-blocking, and gate-like obstacle layouts, boundary constraints, and inter-agent collision risks, with each UAV producing a four-dimensional action composed of normalized thrust and body-frame torques. TASRP adopts a dual-head residual policy in which a pursuit branch generates nominal actions, and a safety branch predicts corrective residuals, together with a risk-aware gating mechanism, a target-guided teacher for obstacle detouring, and a dual-critic safety-constrained optimization scheme. Under clean observations, TASRP achieves task success rates of 75–79%, obstacle crash rates of 13–15%, and boundary crash rates of 1–2% across three representative scenarios. Under noisy observations, TASRP achieves 72.1% task success, 20.3% obstacle crash, and 2.8% boundary crash, outperforming MAPPO (61.2%, 61.2%, 5.6%) and HAPPO (58.1%, 73.5%, 4.1%). These results indicate that explicitly decoupling target-oriented control and safety correction enables a more effective and robust performance–safety trade-off under both clean and moderately noisy observations.

Outline

Target-Aware Safety-Residual Reinforcement Learning for Cooperative Multi-UAV Pursuit in Complex Environments

More from our Archive