Schema Injection Strategies for LLM-Based Text-to-Cypher Query Translation: An Empirical Study

doi:10.1142/s0219649226500541

DOI: 10.1142/s0219649226500541 ISSN: 0219-6492

Schema Injection Strategies for LLM-Based Text-to-Cypher Query Translation: An Empirical Study

Shady Hegazy, Muhammad Ammar, Christoph Elsner, Jan Bosch, Helena Holmström Olsson

Software ecosystems generate complex data traces which can be represented through graph models and stored in graph databases. While large language models (LLMs) offer promising no-code interfaces for querying such databases by translating natural language to database queries, their effectiveness heavily depends on how database schema information is provided. This paper presents a comprehensive evaluation of different schema injection techniques for LLM-driven graph database query generation. We systematically evaluate four key dimensions: schema source, prompt placement, query categories, and results relevancy. Our automated evaluation pipeline records execution outcomes across 50 test queries spanning five query categories. In addition, a human evaluation is conducted for the returned data from each query execution. Results show that LLM-generation of Cypher queries is sensitive to schema source and injection point. In addition, results show that providing the dynamically generated schema through the APOC.meta.graph method yields more relevant and executable queries on average compared to static schema summaries, while injecting schema information in the system prompt outperforms injection in the user prompt. The evaluation provides actionable insights for optimising LLM-driven graph database interfaces and highlights the need for advanced context and software engineering to enable reliable LLM-based solutions.

Outline

Schema Injection Strategies for LLM-Based Text-to-Cypher Query Translation: An Empirical Study

More from our Archive