Skip to content

🎨✨ Stable Video Diffusion Training Code πŸš€

Notifications You must be signed in to change notification settings

yunyangge/SVD_Xtend

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SVD_Xtend

Stable Video Diffusion Training Code πŸš€

Tracklet2Video

We have attempted to incorporate layout control on top of img2video, which makes the motion of objects more controllable, similar to what is demonstrated in the image below. The code and weights will be updated soon.

Init Image Gen Video
demo gen

Comparison

size=(512, 320), motion_bucket_id=127, fps=7, noise_aug_strength=0.00
generator=torch.manual_seed(111)
Init Image Before Fine-tuning After Fine-tuning
demo ori ft
demo ori ft
demo ori ft
demo ori ft

Video Data Processing

Note that BDD100K is a driving video/image dataset, but this is not a necessity for training. Any video can be used to initiate your training. Please refer to the DummyDataset data reading logic. In short, you only need to modify self.base_folder. Then arrange your videos in the following file structure:

self.base_folder
    β”œβ”€β”€ video_name1
    β”‚   β”œβ”€β”€ video_frame1
    β”‚   β”œβ”€β”€ video_frame2
    β”‚   ...
    β”œβ”€β”€ video_name2
    β”‚   β”œβ”€β”€ video_frame1
        β”œβ”€β”€ ...

Training Configuration(on the BDD100K dataset)

This training configuration is for reference only, I set all parameters of unet to be trainable during the training and adopted a learning rate of 1e-5.

accelerate launch train_svd.py \
    --pretrained_model_name_or_path=/path/to/weight \
    --per_gpu_batch_size=1 --gradient_accumulation_steps=1 \
    --max_train_steps=50000 \
    --width=512 \
    --height=320 \
    --checkpointing_steps=1000 --checkpoints_total_limit=1 \
    --learning_rate=1e-5 --lr_warmup_steps=0 \
    --seed=123 \
    --mixed_precision="fp16" \
    --validation_steps=200

Disclaimer

While the codebase is functional and provides an enhancement in video generation(maybe? 🀷), it's important to note that there are still some uncertainties regarding the finer details of its implementation.

TODO List

  • Support text2video (WIP)
  • Support more conditional inputs, such as layout

Contribution

Feel free to fork this repository, submit pull requests, or open issues to discuss potential changes or report bugs. With your valuable input, we can continuously improve SVD_Xtend for the community.

About

🎨✨ Stable Video Diffusion Training Code πŸš€

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%