Hi‐
RAG
: A Hierarchical Retrieval‐Augmented Generation Framework for Scalable and Generalisable Tool Selection in Large Language Model Agents
Wei Tian, Yuhao Zhou ABSTRACT
As tool repositories for Large Language Model (LLM) agents grow from dozens to hundreds of endpoints, flat retrieval paradigms that treat the repository as an unstructured list suffer from context overload, cross‐domain semantic collision and degraded selection accuracy. We propose Hi‐RAG, a Hierarchical Retrieval‐Augmented Generation framework that exploits the Type Service Tool structure of the Model Context Protocol (MCP) via a principled coarse‐to‐fine pipeline. Stage 1 applies a Tool‐as‐Proxy hybrid retrieval strategy (BM25 sparse retrieval combined with dense bi‐encoder search, fused via Weighted Reciprocal Rank Fusion) to identify candidate services efficiently. Stage 2 performs type‐aware re‐ranking over a local heterogeneous graph, integrating a domain‐level gating mechanism with contextualised tool attention for precise service scoring. We further introduce MCPBench , the first benchmark for hierarchical tool selection, comprising 201 tools across 40 real‐enterprise services—23% of queries require multi‐service reasoning. Experiments across five LLMs show Hi‐RAG improves Top‐1 accuracy by up to 7.5% (single‐service) and 10.0% (multi‐service) over Flat‐RAG, while reducing token consumption by up to 89% against full‐context injection. Comparisons with a strong late‐interaction baseline (ColBERT‐RAG) further confirm the superiority of Hi‐RAG. Zero‐shot evaluation on ToolLLM (16,464 tools) confirms scalability, and formal analyses establish context growth with respect to repository size.