CrossFit: Demystifying VM Callback Bugs in Interpreters

doi:10.1145/3808111

DOI: 10.1145/3808111 ISSN: 2994-970X

CrossFit: Demystifying VM Callback Bugs in Interpreters

Chibin Zhang, Qiang Liu, Mathias Payer

Scripting languages like Python, Ruby, or PHP are integral to modern software development. Despite security measures like memory safety and sandboxing, vulnerabilities within these engines can lead to critical issues such as remote code execution or sandbox escapes. A particularly pervasive class of vulnerabilities is callback bugs , which occur when user-defined callbacks violate runtime invariants, such as freeing an object still in use (can be reached through live pointers) or modifying a data structure during traversal. These violations can result in severe consequences, including use-after-free, null-pointer dereferences, and type confusion, often leading to crashes, memory corruption, or even exploitable vulnerabilities. Detecting callback bugs remains challenging due to a lack of general understanding, as they have not been formally characterized or systematically studied. As such, existing tools lack the ability to (1) establish clear links between script-side callbacks and their native-side invokers, and (2) generate scripts that systematically satisfy preconditions required to trigger these bugs.

We propose CrossFit, a novel 2-tier approach combining static analysis and targeted fuzzing to systematically discover callback bugs. CrossFit first establishes links between script-side callbacks and their native-side invokers through context link analysis, enabling targeted exploration of high-risk code paths. It then generates proof-of-concept scripts with custom classes and magic methods, introducing side-effect operations to violate runtime invariants. Our evaluation shows that CrossFit effectively outperforms existing tools by up to 12.04% in terms of callsite coverage (i.e., potential sites where callback bugs may occur). We also identified 20 new bugs in Python, Ruby, and PHP, many of which are severe memory corruptions. Moreover, we provide a comprehensive benchmark totaling 150 proof-of-concepts to improve interpreter security.

Outline

CrossFit: Demystifying VM Callback Bugs in Interpreters

More from our Archive