Generalizing Test Cases for Comprehensive Test Scenario Coverage
Binhang Qi, Yun Lin, Xinyi Weng, Chenyan Liu, Hailong Sun, Gordon Fraser, Jin Song DongTest cases are essential for software development and maintenance. In practice, developers derive multiple test cases from an implicit pattern based on their understanding of requirements and inference of diverse test scenarios, each validating a specific behavior of the focal method. However, producing comprehensive tests is time-consuming and error-prone: many important tests that should have accompanied the initial test are added only after a significant delay, sometimes only after bugs are triggered.
Existing automated test generation techniques largely focus on code coverage. Yet in real projects, practical tests are seldom driven by code coverage alone, since test scenarios do not necessarily align with control-flow branches. Instead, test scenarios originate from requirements, which are often undocumented and implicitly embedded in a project's design and implementation. However, developer-written tests are frequently treated as executable specifications; thus, even a single initial test that reflects the developer's intent can reveal the underlying requirement and the diverse scenarios that should be validated.
In this work, we propose TestGeneralizer, a framework for generalizing test cases to comprehensively cover test scenarios. TestGeneralizer orchestrates three stages: (1) enhancing the understanding of the requirement and scenario behind the focal method and initial test; (2) generating a test scenario template and crystallizing it into various test scenario instances; and (3) generating and refining executable test cases from these instances. To ensure accuracy and completeness, TestGeneralizer combines rule-based prompts, automatically optimized via a prompt auto-tuning technique, with crucial project knowledge retrieved through program analysis. We evaluate TestGeneralizer against three state-of-the-art baselines (EvoSuite, gpt-o4-mini, and ChatTester) on 12 open-source Java projects, covering 506 multi-test focal methods and 1,637 test scenarios. TestGeneralizer achieves significant improvements: +57.67% and +59.62% over EvoSuite, +37.44% and +32.82% over gpt-o4-mini, and +31.66% and +23.08% over ChatTester, in mutation-based and LLM-assessed scenario coverage, respectively. In a field study, we submitted 27 generalized tests overlooked by developers; 16 were accepted and merged into official repositories, demonstrating the practical usefulness of TestGeneralizer.