--- base_model: - tencent/DepthCrafter - stabilityai/stable-video-diffusion-img2vid-xt language: - en library_name: geometry-crafter license: other tags: - video-to-3d - point-cloud --- ## ___***GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors***___
_**[Tian-Xing Xu1](https://scholar.google.com/citations?user=zHp0rMIAAAAJ&hl=zh-CN), [Xiangjun Gao3](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en), [Wenbo Hu2 †](https://wbhu.github.io), [Xiaoyu Li2](https://xiaoyu258.github.io), [Song-Hai Zhang1 †](https://scholar.google.com/citations?user=AWtV-EQAAAAJ&hl=en), [Ying Shan2](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_
1Tsinghua University 2ARC Lab, Tencent PCG 3HKUST ![Version](https://img.shields.io/badge/version-1.0.0-blue)        
## 🔆 Notice **GeometryCrafter is still under active development!** We recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together. For further implementation details, please contact `xutx21@mails.tsinghua.edu.cn`. For business licensing and other related inquiries, don't hesitate to contact `wbhu@tencent.com`. If you find GeometryCrafter useful, **please help ⭐ this repo**, which is important to Open-Source projects. Thanks! ## 📝 Introduction We present GeometryCrafter, a novel approach that estimates temporally consistent, high-quality point maps from open-world videos, facilitating downstream applications such as 3D/4D reconstruction and depth-based video editing or generation. This model is described in detail in the paper [GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors](https://arxiv.org/abs/2504.01016). Release Notes: - `[01/04/2025]` 🔥🔥🔥**GeometryCrafter** is released now, have fun! ## 🚀 Quick Start ### Installation 1. Clone this repo: ```bash git clone --recursive https://github.com/TencentARC/GeometryCrafter ``` 2. Install dependencies (please refer to [requirements.txt](requirements.txt)): ```bash pip install -r requirements.txt ``` ### Inference Run inference code on our provided demo videos at 1.27FPS, which requires a GPU with ~40GB memory for 110 frames with 1024x576 resolution: ```bash python run.py \ --video_path examples/video1.mp4 \ --save_folder workspace/examples_output \ --height 576 --width 1024 # resize the input video to the target resolution for processing, which should be divided by 64 # the output point maps will be restored to the original resolution before saving # you can use --downsample_ratio to downsample the input video or reduce --decode_chunk_size to save the memory usage ``` Run inference code with our deterministic variant at 1.50 FPS ```bash python run.py \ --video_path examples/video1.mp4 \ --save_folder workspace/examples_output \ --height 576 --width 1024 \ --model_type determ ``` Run low-resolution processing at 2.49 FPS, which requires a GPU with ~22GB memory: ```bash python run.py \ --video_path examples/video1.mp4 \ --save_folder workspace/examples_output \ --height 384 --width 640 ``` ### Visualization Visualize the predicted point maps with `Viser` ```bash python visualize/vis_point_maps.py \ --video_path examples/video1.mp4 \ --data_path workspace/examples_output/video1.npz ``` ## 🤖 Gradio Demo - Online demo: [**GeometryCrafter**](https://huggingface.co/spaces/TencentARC/GeometryCrafter) - Local demo: ```bash gradio app.py ``` ## 📊 Dataset Evaluation Please check the `evaluation` folder. - To create the dataset we use in the paper, you need to run `evaluation/preprocess/gen_{dataset_name}.py`. - You need to change `DATA_DIR` and `OUTPUT_DIR` first accordint to your working environment. - Then you will get the preprocessed datasets containing extracted RGB video and point map npz files. We also provide the catelog of these files. - Inference for all datasets scripts: ```bash bash evaluation/run_batch.sh ``` (Remember to replace the `data_root_dir` and `save_root_dir` with your path.) - Evaluation for all datasets scripts (scale-invariant point map estimation): ```bash bash evaluation/eval.sh ``` (Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.) - Evaluation for all datasets scripts (affine-invariant depth estimation): ```bash bash evaluation/eval_depth.sh ``` (Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.) - We also provide the comparison results of MoGe and the deterministic variant of our method. You can evaluate these methods under the same protocol by uncomment the corresponding lines in `evaluation/run.sh` `evaluation/eval.sh` `evaluation/run_batch.sh` and `evaluation/eval_depth.sh`. ## 🤝 Contributing - Welcome to open issues and pull requests. - Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques. ## 📜 Citation If you find this work helpful, please consider citing: ```bibtex @misc{xu2025geometrycrafterconsistentgeometryestimation, title={GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors}, author={Tian-Xing Xu and Xiangjun Gao and Wenbo Hu and Xiaoyu Li and Song-Hai Zhang and Ying Shan}, year={2025}, eprint={2504.01016}, archivePrefix={arXiv}, primaryClass={cs.GR}, url={https://arxiv.org/abs/2504.01016}, } ```