DOI: 10.3390/rs18122050 ISSN: 2072-4292

V3Reg: Model Integrating Visual Information for Extreme Low Overlap Point Cloud Registration

Yaxiong Li, Yifan Hou, Qisong Yang, Dongdong Guan

Extremely low overlap leads to severely scarce local geometric correspondences across frame pairs. Pure geometric descriptors—encoding merely low-level shape signatures—inherently fail to impose sufficient constraints for reliable transformation estimation when matches become critically sparse, rendering registration fundamentally fragile. While recent red-green-blue-depth (RGB-D) attempts have explored visual augmentation, they predominantly rely on low-level chromatic statistics or shallow convolutional neural network (CNN) features, underutilizing the rich hierarchical semantics inherent in RGB imagery. We present V3Reg, a robust registration framework that pioneers the integration of large-scale vision foundation models (DINOv3) with adaptive cross-modal fusion. Specifically, we extract mid-to-deep semantic features (Layer 11) from DINOv3 to transcend low-level texture limitations, and propose a Task-Aware Channel-Wise Gated Adaptive Fusion (TACGAF) module that dynamically calibrates geometric-visual contributions via registration-error-guided channel-wise gating. To rigorously evaluate ultra-low-overlap robustness, we reconstruct RGBD-ZeroMatch, a benchmark with controllable overlap ratios ranging from 1% to 20%. Extensive experiments demonstrate that V3Reg achieves 99.6% Feature Matching Recall and 96.3% Registration Recall on standard benchmarks. Notably, it maintains 50.2% Registration Recall at merely 5% overlap, outperforming prior methods by over 18 percentage points.

More from our Archive