One Size Does Not Fit All: Revisiting Code Context Engineering for Repository-Level Code Generation
Yichen Li, Qiye Lin, Yun Peng, Zhihan Jiang, Jinyang Liu, Chaozheng Wang, Yintong Huo, Cuiyun Gao
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation at the function or file level. However, they achieve limited performance on repository-level code generation due to the complicated repository context where a substantial amount of files and functions exist. To address this challenge,
code context engineering
methods are proposed to accurately extract the relevant code context required by such tasks. These methods belong to three dominant paradigms: 1)
Despite the prevalence of code context engineering approaches, their individual contributions are often coupled within complex agents in repository-level code generation, making it difficult to isolate and evaluate the effectiveness of each paradigm. This paper presents the first large-scale and systematic empirical study that compares three code context engineering paradigms independently. We evaluate ten representative code context engineering methods based on the three paradigms, with eight popular LLMs on this task. We also propose a new metric named Dependency Collection Rate (DCR) and efficiency metrics to enable the direct comparison of code context engineering methods, rather than observing their impacts only on the final code generation performance.
Our findings reveal fundamental trade-offs: static analysis provides the most reliable balance between effectiveness and cost for function-level generation, while navigation-based approaches become increasingly advantageous as task complexity grows. However, navigation requires powerful models and incurs 10-20× higher computational costs. Based on these findings, we provide actionable implications for AI coding researchers and software developers to guide the design and deployment of context-aware coding tools.