Geometric Principles of Stereo Vision: A Quantitative Evaluation and Physical Validation of the Classical Pipeline
Angel Fernando Ceballos-Espinoza, David Balderas-Silva, Alfredo Diaz-Lara, Rita Q. Fuentes-AguilarStereo vision is essential for passive three-dimensional perception in resource-constrained applications that require low power consumption, predictable latency, and explainable geometry. Although deep learning architectures dominate recent benchmarks, the classical block-matching pipeline remains a foundational approach. Optimizing this pipeline involves navigating complex trade-offs among matching robustness, map density, and computational efficiency. This study systematically surveys and physically validates the classical stereo framework. After revisiting geometric first principles, three matching costs (SAD, NCC, ZNCC) are benchmarked alongside Sobel preprocessing and structural refinements, with subsequent validation using a calibrated consumer webcam rig. Middlebury benchmarks (2001–2021) indicate that while SAD fails under complex radiometric distortion, NCC consistently achieves superior quantitative metrics, incurring only a 1.2-fold computational overhead. Extending the disparity search range improves foreground localization, while block size imposes a trade-off between resolving the aperture problem and preserving fine geometric detail. To bridge theoretical analysis and practical deployment, the pipeline is validated using a custom-calibrated consumer stereo rig. The optimized Sobel-NCC architecture is then evaluated for real-time edge deployment on constrained hardware (NVIDIA Jetson Nano) and narrow-baseline sensors (OAK-D SR) in the context of agricultural robotic manipulation. By prioritizing metric precision over dense prediction, the classical pipeline reconstructs target surfaces with approximately 1 cm depth accuracy at 21 frames per second. These results demonstrate that optimized local algorithms offer deterministic and reliable geometric foundations for real-time edge-computed robotics. Although neural networks are essential for dense reconstructions in ill-posed regions, the foundational principles established here remain indispensable for advanced stereo vision system deployment.