Boosting Embodied Visual Localisation Through Multi‐Granular Semantics and Multi‐Robot Consensus
Wenshuai Wang, Shiyu Liu, Peifeng Jiang, Hong Liu, Runwei DingABSTRACT
Achieving high accuracy and synergy remains extremely difficult for multi‐robot embodied visual localisation, which suffers from persistent real‐world challenges such as viewpoint ambiguity, appearance variation and dynamic occlusion. Conventional optimisation‐based methods often lead to incorrect feature matching without domain adaptation, whereas existing deep learning models are limited in both precision and generalisability for pose regression. Although semantic information can offer a cognitive‐level scene representation, its potential has not been fully exploited due to inconsistent labelling and scaling. To address these issues, this paper proposes a novel semantic and synergic consensus framework (termed SSC‐MR) that enhances image retrieval and pose estimation by leveraging multi‐granular semantics and multi‐robot consensus. Specifically, the spatial topology is attentively generated as a global scene graph for fast image retrieval across heterogeneous robots in a holistic workspace, whereas a local semantic‐guided pose estimation strategy refines matched keypoints under semantic consistency constraints. To further improve the efficiency of multi‐robot collaboration, an intra‐robot minimum redundancy consensus module selects informative keyframes and compresses feature representations to reduce perceptual redundancy, and an inter‐robot maximum correlation consensus module further establishes high‐confidence state associations among interacting robots for knowledge sharing. Extensive experiments on five public datasets demonstrate that the proposed SSC‐MR obtains superior effectiveness and robustness across diverse indoor and outdoor scenes, providing a reliable and scalable solution for embodied visual localisation in practical robotic applications. Notably, it achieves a 62.1% improvement in localisation accuracy with nearly 22× faster speed compared to a typical distributed architecture for 10 collaborative robots. Code is available at: