Deep Reinforcement Learning for Collision-Aware Leader–Follower Formation Control
Sławomir Romaniuk, Jakub BudnikThis work tackles leader–follower navigation in scenarios where rapid, unpredictable leader maneuvers can trigger unsafe proximity or collisions. We apply Proximal Policy Optimization (PPO) to learn adaptive follower behavior capable of tracking the leader’s direction while considering a minimum-distance safety objective. A task-specific reward function—capturing formation constraints, collision avoidance, control-effort regularization, and responsiveness to sudden directional changes—enables effective policy learning. The results show that the PPO-based follower provides a favorable safety–tracking trade-off compared with the considered A2C and PID baselines under the simulated test conditions, although collision-free operation is not guaranteed in all episodes.