Fault Diversity in Reinforcement Learning Policy Testing
Quentin Mazouni, Arnaud Gotlieb, Mathieu Acher, Helge Spieker
Reinforcement Learning (RL) is the common approach to solving sequential decision-making problems. Testing is an effective method for detecting functional faults in RL policies. Not only should policy testing find as many failures as possible, but it should also reveal diverse ones. In that regard, Quality diversity optimization (QD) is a type of evolutionary algorithm that returns a set of high-quality diverse solutions. In this article, we explore the use of Quality Diversity to tackle fault diversity in policy testing. We define and address the challenges of adapting QD to testing policies with both deterministic test cases – considered by most of the literature – and stochastic executions. We utilize stochastic executions to define a domain-agnostic