SnakeCharmer: Automatic Fuzzing Harness Generation for Pure and Hybrid Python Libraries

doi:10.1145/3797066

DOI: 10.1145/3797066 ISSN: 2994-970X

SnakeCharmer: Automatic Fuzzing Harness Generation for Pure and Hybrid Python Libraries

Gabriel Sherman, Stefan Nagy

With Python’s rising popularity, ensuring the correctness of its ever-growing ecosystem of software libraries is more critical than ever. Recently, fuzzing has become a de facto technique for vetting software libraries, enabled via the use of harnesses: small wrapper programs that inject fuzzer-generated test cases into the library under test. While harness creation has shed its reliance on human expertise and is now fully automated for languages such as C and C++, Python remains uniquely challenging—both for pure Python libraries as well as hybrid ones combining Python with native C/C++ extensions—due to (1) limited visibility across language boundaries, (2) the absence of reliable bug oracles, and (3) incomplete type information. Consequently, attempts at automating harnessing for Python fail to both uphold critical runtime behaviors and produce the structured call and data flows needed for effective fuzzing, leaving much of today’s Python ecosystem largely unvetted.

To overcome these challenges and broaden fuzzing’s reach across Python libraries, this paper introduces SnakeCharmer: the first automated harness generation approach for both pure and hybrid Python libraries. At its core, SnakeCharmer leverages static analysis to first capture key interface information from both Python and native code components, subsequently enriching it with runtime-captured type information and exception behaviors. During fuzzing, SnakeCharmer further distinguishes between expected exceptions and true library bugs, filtering out benign exceptions that would otherwise derail testing progress. Together, these techniques significantly enhance the scope and effectiveness of fuzzing across the Python library ecosystem, enabling the automated discovery of bugs in code previously inaccessible to existing Python fuzzing efforts.

We evaluate SnakeCharmer alongside today’s leading Python auto-harnessing approach, PyRTFuzz; the actively fuzzed expert-written harnesses from both OSS-Fuzz and PolyFuzz; and the harnesses generated by Google’s own state-of-the-art LLM-driven automatic harnessing approach, OSS-Fuzz-Gen. Across 21 diverse Python libraries, SnakeCharmer attains type-recovery precision and exception-filtering accuracy of 95% and 97%, respectively, further attaining 1.48×, 1.87×, 1.78×, and 1.40× the code coverage of the fuzzing harnesses from PyRTFuzz, OSS-Fuzz, PolyFuzz, and OSS-Fuzz-Gen, respectively. Further, SnakeCharmer finds 16, 24, and 24 more Python library bugs than all expert- and LLM-created harnesses as well as PyRTFuzz, respectively—uncovering a total of 20 new bugs, with 18 since confirmed or fixed by developers.

Outline

SnakeCharmer: Automatic Fuzzing Harness Generation for Pure and Hybrid Python Libraries

More from our Archive