DOI: 10.1049/itr2.70273 ISSN: 1751-956X

Hybrid Attention and Convolutional Induction Network for LiDAR‐Camera BEV Fusion in Complex Road Scenes

Xiang Yu, Longzheng Xu, Ying Qian, Daiyin Zhu

ABSTRACT

LiDAR‐camera fusion is important for 3D object detection in autonomous driving, yet existing bird's‐eye view (BEV) fusion methods often struggle to balance cross‐modal interaction, global context modelling and local detail preservation. To address this issue, we propose a hybrid attention and convolutional induction network for LiDAR‐camera BEV fusion in complex road scenes. The network contains three coordinated components: a cross‐attention module for bidirectional feature interaction, a self‐attention module for global dependency modelling and an interleaved feature inductive module for local detail preservation and training stabilisation. Built on the BEVFusion framework, the proposed method improves mAP by 2.78% and NDS by 2.60% on the nuScenes dataset. The gains are more evident for detail‐sensitive categories such as pedestrians, bicycles and traffic cones. These results indicate that coordinating repeated cross‐modal attention with interleaved local inductive priors is an effective strategy for improving BEV fusion performance in complex scenes.

More from our Archive