ODE‐Based MAPPO Optimization for Guard Strategies in Artificial Attack Scenarios
Nengfei Cui, Haoyang Li, Zheng Yang, Zhicheng DongABSTRACT
This paper presents a reinforcement learning‐based framework for optimizing guard strategies in artificial attack scenarios. The environment is modeled as a multi‐agent confrontation game, in which guards collaborate to intercept an attacker while protecting evacuating pedestrians. Pedestrian movement is governed by an extended floor field model, reflecting panic‐induced behavior under threat. To address the coordination and adaptability challenges in such high‐stakes dynamic settings, we adopt the ordinary differential equation (ODE)‐based Multi‐Agent Proximal Policy Optimization (MAPPO) algorithm under a centralized training and decentralized execution paradigm. By training agents through iterative interaction with the environment, the proposed approach enables guards to learn robust and efficient defense policies. Simulation results demonstrate that with the MAPPO‐trained policies, guards can effectively constrain the attacker, minimize pedestrian casualties, and enable efficient cooperative defense.