SO (3) Equivariant Feature Enhanced Network for 3D Object Detection
Sanghyun Ryoo, Jangwon Kim, Minjae Kim, Soohee HanABSTRACT
Lidar‐based 3D object detection requires accurate reasoning about object location, scale and orientation from sparse point clouds. Existing voxel‐based detectors achieve strong performance, but their second‐stage refinement features do not explicitly encode how local voxel geometry transforms under rotations. This letter proposes the equivariant feature enhanced network (SEFEN), a plug‐in refinement module for voxel‐based two‐stage 3D object detectors. SEFEN introduces equivariant voxel ROI pooling, which computes local ROI–voxel coordinates, expands them using spherical harmonics and generates rotation‐aware features via tensor‐product operations. These equivariant features are concatenated with the original voxel ROI features for final 3D box refinement. Since SEFEN is applied only to the second‐stage refinement module, it preserves the original backbone and proposal generation pipeline. Experiments on the KITTI validation set show that SEFEN consistently improves Voxel‐RCNN, increasing moderate car from 84.94% to 85.06% and moderate cyclist from 71.23% to 73.20%. These results demonstrate that local equivariant feature enhancement improves orientation‐sensitive 3D object detection while retaining the original detector architecture.