LLM-Assisted Input-Requirement-Aware Differential Testing of Array Programming Frameworks
Zhichao Zhou, Jingzhu HeArray programming (AP) frameworks (e.g., NumPy and Octave) are widely adopted in scientific computing. Critical defects can jeopardize the entire ecosystem. The stability of API designs enables differential testing on various implementations (e.g., two versions). However, two primary obstacles remain. First, current test generation cannot effectively generate valid inputs, as the APIs (e.g., matrix multiplication) have type constraints and semantic requirements. Second, unit testing approaches test APIs independently, but they share a core N-dimensional array structure (ndarray) as inputs. Modifying one API may alter the ndarray's properties, breaking the correctness of others. We propose a differential testing tool for array programming, called ArrayDiff. We first collect semantic requirements from NumPy's APIs and leverage LLMs to transfer NumPy's requirements to other frameworks. Then, we propose an input-requirement-aware API call generator (IRA-ACG). Based on IRA-ACG, ArrayDiff employs search algorithms to evolve tests while ensuring valid inputs. ArrayDiff can generate valid and complex API call sequences to detect potential differences. We evaluate ArrayDiff and its ablation versions on five AP pairs. They detect 47 valid-input differences and 39 invalid ones, with 23 confirmed as bugs or document issues. IRA-ACG boosts the detection of valid-input differences, which constitute most confirmed bugs. Comparing ArrayDiff with TitanFuzz (LLM-based fuzzer) and Ghostwriter (unit tester) confirms the benefits of IRA-ACG and sequence-level testing.