Adaptive Query Performance Prediction for Retrieval-Augmented Generation: Bridging Retrieval Quality and Generation Relevance

doi:10.1145/3827605

DOI: 10.1145/3827605 ISSN: 1046-8188

Adaptive Query Performance Prediction for Retrieval-Augmented Generation: Bridging Retrieval Quality and Generation Relevance

Aparajita Sinha, Kunal Chakma

Query Performance Prediction (QPP) estimates retrieval system effectiveness without relevance judgments, a challenge that intensifies in Retrieval-Augmented Generation (RAG) pipelines where retrieval quality directly shapes downstream answer quality. We introduce RAG-QPP , a retrieval-centric framework that predicts query difficulty from a twelve-dimensional post-retrieval feature set combining semantic similarity, lexical, and score-distribution signals, extending beyond classical QPP measures (Clarity, WIG, and NQC). Unlike generator-dependent approaches that require model-internal signals such as perplexity or token-level uncertainty, RAG-QPP operates exclusively on post-retrieval features, remaining applicable to any black-box generator without modification. Random Forest, XGBoost, and LightGBM are evaluated across four retrieval paradigms (sparse, dense, hybrid, and late-interaction) on MS MARCO Passage, MS MARCO Document, Natural Questions, and Robust04 datasets. Prediction accuracy is assessed via Pearson \(r\) , Spearman \(\rho\) , and Kendall \(\tau\) , with MRR@10 as the primary target and nDCG@10 and Average Precision (AP) as supplementary metrics. RAG-QPP achieves moderate but statistically significant correlations with ground-truth MRR@10 (Pearson \(r=0.6587\) in-domain; \(r=0.5332\) out-of-domain) without dataset-specific tuning. Predicted scores align with BART-large and LLaMA-3-8B generation quality across ROUGE-L, BERTScore-F1, and token-level F1 ( \(r\approx 0.22\) – \(0.48\) ). QPP-guided adaptive retrieval yields consistent generation improvements, with the largest gains for low-effectiveness queries ( \(\Delta\) ROUGE-L \(=+0.057\) ), exceeding a learning-free adaptive baseline and confirming that the benefit derives from the learned retrieval-quality signal. Ablation confirms the dominance of semantic similarity features over traditional lexical QPP signals under neural retrieval. Random Forest leads in-domain while LightGBM generalises best under distribution shift. RAG-QPP provides an interpretable, modular, and architecture-agnostic diagnostic layer for adaptive RAG systems. The implementation and associated resources are publicly available at https://github.com/APARAJITA1997/RAG_QPP_JH_2026 .

Outline

Adaptive Query Performance Prediction for Retrieval-Augmented Generation: Bridging Retrieval Quality and Generation Relevance

More from our Archive