Object-wise metric distance estimation from a single RGB image via semantic and geometric reasoning

Sultana, Abida

Object-wise metric distance estimation from a single RGB image via semantic and geometric reasoning

Files

Sultana_Abida_MASc_2026.pdf (8.38 MB)

Date

2026

Authors

Sultana, Abida

Abstract

Estimating metric object distance from a single RGB image is challenging because monocular depth does not provide an absolute scale. Existing solutions either require active sensors such as LiDAR or stereo, rely on monocular depth that remains scale-ambiguous, or use implicit vision-language reasoning that can be unstable for precise measurement. This thesis proposes a semantic–geometric pipeline for recovering metric scale by combining open-vocabulary object grounding and segmentation, label normalization, monocular depth, and camera cues. Object-centric 3D points are reconstructed from the predicted depth, an oriented 3D bounding box is fitted to estimate object dimensions, and real-world size priors are used to compute a scale factor that converts relative depth into absolute distance. The proposed method is evaluated on HOT3D, ScanNet, ARKitScenes, and a custom iPhone dataset, achieving Multi-Threshold Relative Accuracy (MRA) values of 68.85%, 88.30%, 75.12%, and 89.85%, respectively, under the per-frame average mean-distance strategy. The results show that frame-level averaging improves stability by reducing the influence of instance-level outliers. The main limitations of the approach are its dependence on segmentation and depth quality, sensitivity to canonical size priors for categories with high size variation, possible instability under occlusion or truncation, and relatively high processing time. Future work includes more robust scale estimation, adaptive size priors, improved object fitting, the use of consecutive frames for temporal consistency, and pipeline optimization for lower latency.

Keywords

monocular distance estimation, semantic–geometric pipeline, 3D distance, depth estimation, metric scale recovery

URI

https://hdl.handle.net/1828/23884

Collections

Electronic Theses and Dissertations (ETD)

Full item page

Object-wise metric distance estimation from a single RGB image via semantic and geometric reasoning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections