DOI: 10.3390/knowledge6030013 ISSN: 2673-9585

Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity

Soham Mukherjee, John Le, Chau Nguyen

Large Language Models (LLMs) have emerged as transformative tools across various domains, exhibiting remarkable capabilities in natural language processing and generation. However, their reliance on static pre-training data limits their ability to access up-to-date and domain-specific information. The existing research often treats augmentation strategies in isolation, and limited efforts have been made to systematically compare them through the lens of information integrity. This review focuses specifically on Retrieval-Augmented Generation (RAG) and fine-tuning, identifying them as the two dominant paradigms for integrating external knowledge: RAG for retrieval-based context injection and fine-tuning for parametric knowledge adaptation. While existing surveys predominantly focus on performance metrics like accuracy or latency, this paper addresses the critical gap of data fidelity—the preservation of truthfulness, integrity, and fairness during augmentation. We systematically synthesize empirical findings from diverse methodologies to determine how each approach mitigates hallucinations and bias. By comparing the trade-offs between retrieval-based context injection and parametric knowledge adaptation, this survey provides unique value to readers by providing a structured taxonomy, a unified evaluation framework, and actionable insights to guide future research and practical deployment of robust, high-fidelity LLMs.

More from our Archive