Active Learning of Symbolic Automata for Reactive Programs via Dynamic Symbolic Mapper
Yoel Kim, Yunja ChoiActive learning of formal behavior models from program source code is a powerful approach for a wide range of software analysis, validation, and verification tasks, including understanding system intent, automating specification mining, generating test oracles, and checking formal properties. Recent advances in active learning of symbolic automata, powered by program synthesis and model checking, provide both rich expressiveness and soundness guarantees for the learned models. However, these techniques often encounter significant performance bottlenecks, particularly when dealing with reactive programs that expose many program variables with large value domains.
This work introduces an extended active learning algorithm tailored for reactive programs by incorporating a novel dynamic symbolic mapper for learning symbolic automata. The mapper abstracts program behavior using learner-inferred predicates over program variables, encodes each valuation of these variables as a Boolean vector induced by these predicates (Boolean abstraction), and dynamically refines the abstraction in response to missing behaviors identified by the teacher. The mapper is granularity-aware: for teaching, it employs the coarsest predicates sufficient to expose missing behaviors, enabling broad exploration; for learning, it refines the abstraction only to the finest predicates necessary to resolve the uncovered gaps, trying to avoid refinements that could otherwise be triggered by coarse abstraction.
We evaluated our approach on 120 benchmarks, including SV-COMP tasks, Simulink model-driven programs, LeetCode problems, and representative embedded control software. The results show that our method learns 32 more symbolic automata and reduces the average active learning time by 55.6% compared to the state-of-the-art.