Image-to-Video
Diffusers
Safetensors
StableVideoDiffusionPipeline
TaiMingLu's picture
Update README.md
7c9380b verified
metadata
base_model:
  - stabilityai/stable-video-diffusion-img2vid-xt-1-1
pipeline_tag: image-to-video
datasets:
  - TaiMingLu/Genex-DB-World-Exploration
license: cc-by-4.0

GenEx-World-Explorer πŸš€πŸŒ

GenEx World Explorer is a video generation pipeline built on top of Stable Video Diffusion (SVD). . It takes a keyframe, and generates a temporally consistent video. This explorer version builds on SVD with a custom UNetSpatioTemporalConditionModel.

The diffuser generate a forward moving path of a panoramic input image, to explore a given scene.

πŸ“¦ Usage

from diffusers import UNetSpatioTemporalConditionModel, StableVideoDiffusionPipeline
import torch
from PIL import Image

model_id = 'genex-world/GenEx-World-Explorer'

# Load the custom UNet
unet = UNetSpatioTemporalConditionModel.from_pretrained(
    model_id,
    subfolder='unet',
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
)

# Load the full pipeline with custom UNet
pipe = StableVideoDiffusionPipeline.from_pretrained(
    model_id,
    unet=unet,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    local_files_only=True,
).to('cuda')

# Explore the world!
image = Image.open('example.png').resize((1024, 576), Image.BICUBIC).convert('RGB')

generator = torch.manual_seed(-1)
with torch.inference_mode():
    frames = self.pipe(image,
                num_frames=25,
                width=1024,
                height=576,
                decode_chunk_size=8, generator=generator, motion_bucket_id=127, fps=7, num_inference_steps=30, noise_aug_strength=0.02).frames[0]

πŸ”§ Requirements

diffusers>=0.33.1
transformers
numpy
pillow

✨ BibTex

@misc{lu2025genexgeneratingexplorableworld,
      title={GenEx: Generating an Explorable World}, 
      author={Taiming Lu and Tianmin Shu and Junfei Xiao and Luoxin Ye and Jiahao Wang and Cheng Peng and Chen Wei and Daniel Khashabi and Rama Chellappa and Alan Yuille and Jieneng Chen},
      year={2025},
      eprint={2412.09624},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.09624}, 
}