Decentralized Shared Actor–Critic Learning for Collision-Aware Small-Team Multi-Robot Coverage
Abzal E. Kyzyrkanov, Didar Yedilkhan, Saltanat Amirgaliyeva, Sergazy NarynovThis study presents a decentralized shared actor–critic framework for cooperative multi-robot coverage in continuous two-dimensional simulation. The method combines permutation-invariant local observations, continuous differential-drive control, and reward shaping based on stepwise Hungarian assignment distances, collision penalties, and time efficiency. Homogeneous teams of four, five, and six agents are evaluated in an obstacle-free environment using five independent training seeds. In the final training window, the full reward configuration achieved full-team success rates of 98.2 ± 2.9% for four agents, 85.1 ± 18.0% for five agents, and 96.3 ± 2.0% for six agents, with mean landmark coverage above 96% in all cases. The lower mean in the five-agent setting was associated with higher seed-level variability dominated by one low-success seed. Reward ablations without assignment shaping or collision penalties remained viable, and seed-level tests did not show a statistically significant final-window advantage of the full reward configuration. The full configuration reached the 80% rolling-success threshold earlier in median terms, with the clearest seed-level support in the four-agent setting. Within-environment comparison showed higher full-team success than MADDPG and MAPPO under the matched training horizon and final-window protocol. Deterministic arena-size transfer from 15×15 to 30×30 showed decreasing full-team success as arena size increased, while partial landmark coverage remained higher than strict full-team completion. The results support the method for small homogeneous teams in the tested obstacle-free simulation, while larger teams, external obstacles, aerial-robot dynamics, formal safety guarantees, and hardware deployment remain future work.