DOI: 10.3390/app16136624 ISSN: 2076-3417

Object-Centric Seamless Pose Estimation in Multi-Object Scenes by Scale Alignment of Ray Diffusion and Iterative Closest Point

YeonChang Jeong, Dong-Uk Seo, Kwanwoo Park, Soon-Yong Park

Robust estimation of camera trajectories from unconstrained image sequences remains a fundamental problem in computer vision and robotics. Recently, a diffusion-based camera tracking network has shown strong performance in sparse-view and single-object-centric settings, where a consistent object is observed across frames. However, when multiple objects appear sequentially in a video, the initially observed object may disappear as the sequence progresses, which prevents maintaining the “single-object-centric” paradigm across all frames and degrades pose estimation when the conventional method is applied to the multi-object sequence. In this work, we propose an object-centric camera pose estimation framework that handles such sequences by partitioning a video into object-level sub-scenes. As a baseline network, Ray Diffusion is applied to single-object sub-scenes, while frame-to-frame camera motion in multi-object sub-scenes is estimated using monocular video depth, object masks, and point cloud alignment using Iterative Closest Point (ICP). Since the domain of pose estimation from different sub-scenes is inconsistent in terms of pose scale, it requires seamless concatenation of pose estimation results through all sub-scenes. In this regard, we introduce a scale alignment strategy based on reprojection error minimization. This enables the pose estimates from individual sub-scenes to be integrated into a single and seamless camera trajectory. We evaluate the proposed method on a newly collected indoor dataset consisting of 40 multi-object video sequences. Experimental results compare our camera trajectory estimation with both the diffusion-based method and the state-of-the-art visual SLAM methods.

More from our Archive