Behavior Recognition of Novice Drivers Based on Bimodal Eye-Tracking Characteristics and a Parallel CNN-Mamba Model
Jianzhuo Li, Panyu Dai, Jiake Li, Ye YuDriving behavior recognition plays a crucial role in intelligent driving systems and road traffic safety. Due to insufficient driving experience and limited ability to allocate visual attention, novice drivers are considered a high-risk group for traffic accidents. Existing approaches primarily focus on experienced drivers and rely on single-modal eye-tracking data, making it difficult to model spatial attention distributions and long-term temporal dependencies simultaneously. Moreover, these methods are often affected by modality asynchrony during multimodal fusion, further limiting performance gains. To address these challenges, this study proposes a novice driver behavior recognition method based on bimodal eye-tracking features and a gated cross-modal attention fusion (GCMAF) mechanism. The model adopts a spatial–temporal dual-branch architecture. The spatial branch employs ResNet34 to extract eye-tracking heatmap features to represent the visual attention distribution. In contrast, the temporal branch integrates a 1D-CNN with the Mamba model to capture local dynamic patterns and long-range temporal dependencies. In the fusion stage, the GCMAF module is introduced to enhance cross-modal interactions, and a gating mechanism is further used to adaptively adjust modality weights, thereby mitigating the adverse effects of modality asynchrony. To validate the effectiveness and generalization ability of the proposed method, repeated experiments and five-fold cross-validation are conducted. The results demonstrate that the model achieves an average classification accuracy of 93.86% across four driving behavior categories, with standard deviations below 0.3%. Compared with baseline methods, paired t-test results show that the performance improvement is statistically significant (p < 0.01). Ablation studies further confirm the independent contribution of each component. Overall, the proposed method outperforms existing approaches in terms of accuracy and stability, providing effective support for driving behavior assessment and proactive safety warning systems.