VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

编辑：广东客 | 分类：CV | 2025年5月8日

Note: We don't have the ability to review paper

PubDate: Jan 2025

Teams:1UT Austin 2UPenn 3Stanford 4JHU 5Meta

Writers:Wenyan Cong, Hanqing Zhu, Kevin Wang, Jiahui Lei, Colton Stearns, Yuanhao Cai, Dilin Wang, Rakesh Ranjan, Matt Feiszli, Leonidas Guibas, Zhangyang Wang, Weiyao Wang, Zhiwen Fan

PDF:VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

Abstract

Efficiently reconstructing 3D scenes from monocular video remains a core challenge in computer vision, vital for applications in virtual reality, robotics, and scene understanding. Recently, frame-by-frame progressive reconstruction without camera poses is commonly adopted, incurring high computational overhead and compounding errors when scaling to longer videos. To overcome these issues, we introduce VideoLifter, a novel video-to-3D pipeline that leverages a local-to-global strategy on a fragment basis, achieving both extreme efficiency and SOTA quality. Locally, VideoLifter leverages learnable 3D priors to register fragments, extracting essential information for subsequent 3D Gaussian initialization with enforced inter-fragment consistency and optimized efficiency. Globally, it employs a tree-based hierarchical merging method with key frame guidance for inter-fragment alignment, pairwise merging with Gaussian point pruning, and subsequent joint optimization to ensure global consistency while efficiently mitigating cumulative errors. This approach significantly accelerates the reconstruction process, reducing training time by over 82% while holding better visual quality than current SOTA methods.

本文链接：https://paper.nweon.com/16323

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

您可能还喜欢...

Convolutional Occupancy Networks

Real-Time Moving Objects Segmentation based on RGB-D camera

Robust Hand Gesture Input Using Computer Vision, Inertial Measurement Unit (IMU) and Flex Sensors

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘