DOI: 10.1145/3797109 ISSN: 2994-970X

Failing with Purpose: Dangling Coverage-Guided Negative Test Generation from a Mechanized P4 Type System

Jaehyun Lee, Seokhun Jeong, Sukyoung Ryu

A type checker must reject ill-typed programs in addition to accepting well-typed programs. Negative type checker tests, programs expected to be rejected, validate that a type checker enforces the language’s typing rules as intended. We focus on negative type checker tests for P4, a domain-specific language for programmable network devices, whose type system encodes design principles and hardware constraints of the network dataplane. Failing to reject an ill-typed P4 program risks violating these principles and constraints, leading to unexpected errors. A comprehensive negative test suite covering subtle and diverse ill-typed conditions is thus important. However, constructing comprehensive negative tests is challenging: the negative input space lacks systematic characterization, and existing P4 program generators do not target subtle type errors.

This paper addresses the problem in three steps. (i) We mechanize the P4 type system using the SpecTec framework. Unlike the informal official P4 specification, the mechanized type system is formal and machine-readable. Mechanization enables a systematic analysis of the type system. (ii) Across the mechanized type system, we identify dangling premises, which are premises in the typing rules that, when violated, cause type errors. Based on them, we propose dangling coverage, a novel metric for quantifying negative test coverage. (iii) Finally, we implement a coverage-guided fuzzer that mutates well-typed P4 programs into ill-typed programs that increase dangling coverage. Our method identifies 939 dangling premises that characterize distinct ill-typed conditions in the P4 type system. The fuzzer generates a negative test suite achieving 33.02%p higher dangling coverage than the existing P4C reference compiler’s test suite. The generated tests also reveal 29 previously unknown bugs in the compiler frontend, demonstrating the effectiveness of both the coverage metric and the fuzzer. The tests generated by our fuzzer are now integrated into the P4C test suite.

More from our Archive