DOI: 10.3390/sym18071133 ISSN: 2073-8994

Edge-Texture-Aware Semantic Dual-Query Fusion for Multimodal 3D Object Detection

Yuehan Wu, Zheng Zheng, Kai Liu, Leyan Chen, Rihan Wu

Multimodal 3D object detection benefits from the complementary nature of camera images and LiDAR point clouds. However, existing voxel–pixel fusion methods typically rely on relatively coarse cross-modal interactions, which limit fine-grained structural modeling and degrade performance on small safety-critical objects. To address this issue, we propose ETA-SDQF, an edge-texture-aware semantic dual-query fusion framework designed to enhance 3D perception of vehicles, cyclists, and pedestrians. The proposed method first introduces an edge-texture-aware image backbone (ETAIB) based on the discrete wavelet transform (DWT), which improves the representation of multi-scale fine-grained image features. Then, we design a dual-query-guided attention fusion (DQGAF) module, which leverages deformable attention to adaptively aggregate voxel-aligned multi-scale image features under joint semantic and edge-texture guidance. Finally, we adopt a hybrid 3D feature learning strategy inspired by PV-RCNN, combining voxel-based feature learning with PointNet-style feature abstraction for processing fused features. This design improves the utilization of voxel features enriched with image semantics, thereby facilitating more reliable 3D object proposal generation. Experimental results on the KITTI dataset demonstrate that the proposed framework achieves better performance compared to existing baseline methods. It consistently improves pedestrian and cyclist detection, while maintaining competitive performance on car detection across different difficulty levels, showing potential benefits on challenging KITTI samples.

More from our Archive