DOI: 10.3390/automation7040100 ISSN: 2673-4052

An A*-Distance-Guided Exploration Strategy for Multi-AGV Path Planning

Ying Zhou, Yixin Feng, Peiyan Mao, Pengfei Wang

A common limitation of existing multi-AGV cooperative systems is their reliance on the obstacle-agnostic Manhattan distance as the basis for reward signals. This causes agents to receive misleading feedback, engage in excessive futile exploration, and ultimately achieve poor training quality. To address this, we introduce an A*-distance guidance mechanism for multi-agent reinforcement learning (MARL) path planning, built on the precise path distance computed via the A* algorithm (A*-distance). Within the QMIX framework, we incorporate an A*-distance-based guiding function into the action selection mechanism. This function evaluates candidate actions by quantifying their immediate effect on the A*-distance, providing positive incentives for actions that bring the agent closer to the goal and applying negative penalties for those that lead it farther away. This effectively biases exploration towards actions that genuinely shorten the obstacle-aware path to the goal, suppresses ineffective exploration, and accelerates policy convergence. Experiments in four warehouse environments (simple obstacles, complex obstacles, large-scale, and congested) show that, compared with standard QMIX, the proposed method achieves higher global average reward and faster convergence. The advantage grows as environment scale and obstacle density increase. In the large-scale and congested environments, standard QMIX and the other MARL baselines fail to solve the task, whereas the proposed method still succeeds. It is the only learning-based method to solve these hardest tasks while keeping path length close to that of dedicated search-based solvers. Ablation experiments further show that the A*-distance-guided action selection is the primary contributor to these gains, while the A*-distance reward plays a supporting role.

More from our Archive