SENTINEL: Action-Level Adversarial Defense for Autonomous Vehicles via Counterfactual Policy Verification
Azzam F. Alserhani, Faeiz M. AlserhaniDeep learning perception in autonomous vehicles (AVs) has created a critical attack surface in which adversarial patches and sensor-spoofing perturbations cascade from perception errors into unsafe driving decisions. Existing defenses face three limitations: most require retraining the perception network, making them impractical for already-deployed fleets; they operate almost exclusively at the perception layer, without verifying whether a compromised detection actually altered the driving action; and they leave temporal consistency across frames largely unexploited. This paper presents SENTINEL, a zero-modification, plug-and-play defense that wraps any deployed AV perception-and-planning stack without updating its weights, calibrating only the detection thresholds, score combination weights, and reference exemplars once on a small held-out calibration set. SENTINEL integrates a frozen foundation model verification ensemble (CLIP, DINOv2, SAM-2), a temporal consistency scorer that flags patches through anomalous frame-to-frame stability under ego-motion, a counterfactual policy verifier that replans under reconstructed perception and measures action-space divergence, and a risk-adaptive safety shield that modulates driving aggressiveness by verification confidence. Across CARLA, nuScenes, KITTI, and BDD100K, against five adversarial attacks and an adaptive adversary, SENTINEL reduces the attack success rate by up to 92%, keeps the clean accuracy loss to approximately 1.8 percentage points, reduces the collision rate under attack by approximately 87%, and adds under 45 ms latency on an RTX 4090 GPU. SENTINEL reframes adversarial robustness as a runtime property of the complete autonomous decision pipeline.