A Q‐Learning Algorithm to Solve the Two‐Player Zero‐Sum Game Problem for Nonlinear Systems
Afreen Islam, Anthony Siming Chen, Guido HerrmannABSTRACT
This paper deals with the two‐player zero‐sum game problem, which is a bounded ‐gain robust control problem. Finding an analytical solution to the complex Hamilton‐Jacobi‐Issacs (HJI) equation is a challenging task. Hence, a novel Q‐learning algorithm for unknown continuous‐time (CT) affine‐in‐inputs nonlinear systems is proposed for generating an approximate solution to the HJI equation, which is valid in a local domain due to the use of a local approximator, that is, a Neural Network (NN) structure. The approach is model‐free and does not require the knowledge of system drift dynamics, and input and disturbance gains. The algorithm learns online from measurements of state variables in real time. To generate the local approximate solution of the HJI equation for the two‐player zero‐sum game problem for nonlinear systems, the proposed non‐iterative algorithm requires only a single critic NN instead of the commonly used triple NN approximator structure. A persistence of excitation condition is required to guarantee Uniformly Ultimately Boundedness (UUB) and convergence to the optimal solution. The effectiveness of the proposed Q‐learning approach for the two‐player zero‐sum game problem is demonstrated via simulations of a linear F‐16 aircraft plant and a highly complex nonlinear system. Proof of closed‐loop system stability is provided using Lyapunov Analysis, and convergence of the approximate solution to the true saddle‐point solution is guaranteed in a UUB‐sense.