Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing
Chaozheng Wang, Zezhou Yang, Shuzheng Gao, Cuiyun Gao, Zongjie Li, Yichen Li, Ting Peng, Hailiang Huang, Yuetang Deng, Michael R. LyuCode editing constitutes a fundamental practice in software development, wherein developers modify existing codebases according to natural language requirements. Accurate code editing necessitates a comprehensive understanding of both the existing codebase and the modification requirements. Although large language models (LLMs) have demonstrated promising performance in code editing tasks, they suffer from substantial inefficiency by generating entire modified files that largely consist of unchanged code. While smaller models could potentially address this inefficiency, they typically lack the capacity to effectively comprehend long code contexts required for accurate editing. To ensure both effectiveness and efficiency, we propose to decompose code editing into a two-stage cascade: edit sketch generation , wherein a large model first produces concise sketches representing the requisite modifications (the more challenging phase), and edit sketch application , wherein a smaller model integrates these sketches into the original code to produce the final output edited code (the simpler phase). This cascaded design reduces the number of tokens generated by the large model, as the majority of the output is handled by the smaller, more efficient model, thereby enhancing overall efficiency. However, the effectiveness of this approach is constrained by current small models’ limited capabilities in handling long-context scenarios and cross-file dependencies, which are essential for accurate sketch application in real-world codebases. To address these limitations and enhance smaller models’ sketch application capabilities, we introduce the first large-scale sketch application dataset comprising over 100K training instances and 800M tokens, along with a human-evaluated benchmark, and propose specialized training strategies including curriculum-based long-context training and multi-file augmentation. Our comprehensive experiments demonstrate that our cascaded framework inherently reduces inference costs compared to direct editing with large models. Furthermore, combining large models with our fine-tuned smaller models can achieve even superior performance. For instance, on the Aider benchmark, employing DeepSeek R1 as the edit sketch generation model alongside a fine-tuned Qwen2.5 Coder 14B model for the application phase improves Pass@2 11.1% compared to direct editing with DeepSeek R1 alone. Additionally, the cascaded approach reduces execution time and cost by 13% and 19%, respectively, demonstrating both performance gains and efficiency improvements.