DOI: 10.1002/aisy.70450 ISSN: 2640-4567

Enhanced Monocular Depth Estimation Using ResNet Backbone and Multiscale Feature Decoding for Precise Object Distance Measurement

Faseeh Muhammad, Misbah Bibi, Murad Ali Khan, Syed Shehryar Ali Naqvi, Adnan Ejaz Muhammad, Do-Hyeun Kim

Monocular depth estimation is a key capability for intelligent transportation systems and Internet of Things (IoT)‐enabled cyber‐physical environments, where scalable, low‐cost, and real‐time perception is essential. Traditional approaches rely on stereo vision or LiDAR, increasing deployment cost and limiting large‐scale adoption. This paper presents a deep learning‐based framework that produces high‐fidelity depth maps for long‐range scenes up to 250 m using the D‐Far250 dataset. The architecture employs a ResNet‐based encoder–decoder with multiscale supervision to improve depth consistency across near and far‐range regions. To enable practical perception for autonomous systems, the depth module is integrated with the YOLOv8 object detector to estimate object‐level distances directly from monocular images. Experimental evaluation on both real and synthetic datasets demonstrates strong accuracy and generalization. The model achieves a ( δ 1 ) accuracy of 94.98% on D‐Far250 and 90.85% on the KITTI. The framework operates at over 200 FPS on an NVIDIA RTX 4070 GPU, confirming suitability for latency‐critical IoT applications. By combining accurate monocular depth prediction with detection‐driven distance inference, the proposed system enables scalable, LiDAR‐free perception for collision avoidance, autonomous navigation, unmanned aerial vehicle (UAV) operations, and smart surveillance in connected IoT ecosystems and advanced urban mobility services worldwide and beyond.

More from our Archive