Proactive Early Warning of Vortex Ring State in Coaxial UAVs: A Physics-Informed Multimodal ViT-LSTM Approach
Xiang Zhou, Jiawei Sun, Jiannan Zhao, Feng ShuangThe Vortex Ring State (VRS) poses a catastrophic aerodynamic threat to coaxial dual-rotor unmanned aerial vehicles (UAVs). Traditional reactive detection mechanisms provide insufficient altitude for recovery, while existing data-driven diagnostics are severely bottlenecked by data leakage, extreme class imbalance, and a lack of physical interpretability. To bridge these gaps, this paper proposes a physics-informed multimodal deep learning framework that transitions from post-occurrence detection to proactive early warning. We establish a 1.5 s precursor window—creating a three-class ordinal state space—to provide the flight control system with critical intervention time for differential rotor recovery. We developed a novel ViT-LSTM architecture (MTSF-Net) to fuse continuous seven-channel onboard-recorded data (comprising three-axis acceleration, three-axis angular velocity, and barometric vertical velocity), which are subsequently transformed into Continuous Wavelet Transform (CWT) spectrograms. To ensure real-time unidirectional inference while preserving absolute physical vibration scales across heterogeneous sensors, a Calibrated Benchmark Normalization (CBN) strategy is introduced. Furthermore, a Hybrid Ordinal Loss is proposed to mitigate the extreme sample imbalance (<0.5%) of the precursor state by penalizing asymmetric aerodynamic degradation. Evaluated under a strict sortie-based isolation protocol, the proposed system achieves an exceptional test accuracy of 98.26% and an unprecedented precursor recall of 100%. Notably, it completely eliminates fatal missed detections (VRS predicted as Normal) and false-positive VRS predictions triggered by precursor states. Finally, Gradient-weighted Class Activation Mapping (Grad-CAM) is utilized to verify that the multimodal sensor processing pipeline successfully anchors onto authentic physical vibration frequencies rather than artifactual noise, laying a rigorous, interpretable foundation for intelligent aviation safety systems.