Skip to content

The official repository of the ICCV2023 paper "Neural Video Depth Stabilizer" (NVDS).

License

Notifications You must be signed in to change notification settings

RaymondWang987/NVDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

Neural Video Depth Stabilizer (ICCV2023) 🚀🚀🚀

🎉🎉🎉 Welcome to the NVDS GitHub repository! 🎉🎉🎉

The repository is official PyTorch implementation of ICCV2023 paper "Neural Video Depth Stabilizer" (NVDS).

Arxiv | Supp-Video | Supp | Demo-Video | Project Page | VDW Dataset (Coming Soon)

😎 Highlights

NVDS is the first plug-and-play stabilizer that can remove flickers from any single-image depth model without extra effort. Besides, we also introduce a large-scale dataset, Video Depth in the Wild (VDW), which consists of 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset. Don't forget to star this repo if you find it interesting!

⚡ Updates and Todo List

  • [2023.07.16] Our work is accepted by ICCV2023.
  • [2023.07.18] The Arxiv version of our NVDS paper is released.
  • [TODO] We will build our project page with video demos and the official website of VDW dataset.
  • [TODO] We will gradually release NVDS model and VDW dataset for the community. Stay tuned!

🌼 Abstract

Video depth estimation aims to infer temporally consistent depth. Some methods achieve temporal consistency by finetuning a single-image depth model during test time using geometry and re-projection constraints, which is inefficient and not robust. An alternative approach is to learn how to enforce temporal consistency from data, but this requires well-designed models and sufficient video depth data. To address these challenges, we propose a plug-and-play framework called Neural Video Depth Stabilizer (NVDS) that stabilizes inconsistent depth estimations and can be applied to different single-image depth models without extra effort. We also introduce a large-scale dataset, Video Depth in the Wild (VDW), which consists of 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset to our knowledge. We evaluate our method on the VDW dataset as well as two public benchmarks and demonstrate significant improvements in consistency, accuracy, and efficiency compared to previous approaches. Our work serves as a solid baseline and provides a data foundation for learning-based video depth models. We will release our dataset and code for future research.