Residual Learning for Riemannian Motion Policies in Multi-Drone Obstacle Avoidance and Communication-Aware Navigation
Liping Shi, Esben Haubro Skov, Asbjørn Lybker Christensen, Rune Hylsberg JacobsenAutonomous multi-drone navigation in unknown environments requires robust solutions for obstacle avoidance, formation cohesion, and inter-drone communication maintenance. We address these challenges using Riemannian Motion Policies (RMPs), a second-order dynamic framework for motion generation, within the RMPflow architecture. We design handcrafted and learned obstacle avoidance policies, including circle-based and LiDAR-based approaches, and introduce residual reinforcement learning as a novel mechanism for refining suboptimal handcrafted policies. To maintain reliable drone-to-drone communication, we propose a radio link quality policy that dynamically adapts the maximum permitted inter-drone distance based on signal strength, contrasting with a fixed worst-case baseline. These policies are composed within RMPflow into an efficient reactive motion planner and evaluated through 2D simulations across multiple configurations. Results show that residual learning substantially improves a suboptimal handcrafted policy, reducing collisions by 35.4% in drone formations and up to 74.7% for single drones. Notably, a policy trained without any initial obstacle avoidance capability failed to obtain such improvement, underscoring the importance of informed policy initialisation. The adaptive radio link policy achieved 20% to 25% shorter navigation time through randomly generated environments compared to a fixed maximum distance constraint, demonstrating the practical value of radio signal-aware formation control.