DOI: 10.1145/3821533 ISSN: 1049-331X

Fault Diversity in Reinforcement Learning Policy Testing

Quentin Mazouni, Arnaud Gotlieb, Mathieu Acher, Helge Spieker

Reinforcement Learning (RL) is the common approach to solving sequential decision-making problems. Testing is an effective method for detecting functional faults in RL policies. Not only should policy testing find as many failures as possible, but it should also reveal diverse ones. In that regard, Quality diversity optimization (QD) is a type of evolutionary algorithm that returns a set of high-quality diverse solutions. In this article, we explore the use of Quality Diversity to tackle fault diversity in policy testing. We define and address the challenges of adapting QD to testing policies with both deterministic test cases – considered by most of the literature – and stochastic executions. We utilize stochastic executions to define a domain-agnostic generic behavior space, tailored to policy testing. We compare QD-based policy testing to a state-of-the-art policy testing framework with two QD optimizers and three environments. Our evaluation assess fault detection, fault diversity and investigate the sensitivity of our method to the behavior space definition and the number of test executions. Our results show that QD improves fault diversity without significantly reducing fault discovery. Furthermore, QD optimization does not necessarily need domain knowledge and can adapt to generic behaviors with stochastic executions.

More from our Archive