Skip to content

It is the official implementation of Shape-aware ControlNet, accepted by ACM MM'2024.

Notifications You must be signed in to change notification settings

DREAMXFAR/Shape-aware-ControlNet

Repository files navigation

When ControlNet Meets Inexplicit Masks:
A Case Study of ControlNet on its Contour-following Ability

Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao

This is the official implementation for Shape-aware ControlNet, which studies the contour-following ability of the influential ControlNet by Zhang and improves its ability on dealing inexplicit masks, i.e., deteriorated masks and human scribbles. Refer to our paper for more details, which is accepted by ACM MM'2024.

🔥 News

  • [2024/07/16]: Our work is accepted by ACM MM'2024.

  • [2024/03/12]: Codes for training and inference are released.

📍 Todo

  • Release training and inference codes. Instructions on dataset preparation and checkpoints are also provided.

✨ Highlight

image-20240310223300677

  • Study on the contour-following ability of ControlNet. We study the contour following ability of ControlNet quantitatively by examining its performance on masks of varying precision and hyper-parameter settings. We reveal inexplicit masks would severely degrade image fidelity for strong priors induced by inaccurate contours.
  • A improved shape-aware ControlNet to deal with inexplicit masks. We propose a novel deterioration estimator and a shape-prior modulation block to integrate shape priors into ControlNet, namely Shape-aware ControlNet, which realizes robust interpretation of inexplicit masks.
  • Extended usage for ControlNet with more flexible conditions like scribbles. We showcase the application scenarios of our shape-aware ControlNet in modifying object shapes and creative composable generation with masks of varying precision.

📝 Introduction

  • Performance degradation of the vanilla ControlNet on deteriorated masks.

    image-20240310223432847
  • A shape-aware ControlNet is proposed to deal with inexplicit masks.

    Model Architecture Performance on Deteriorated Masks
    image-20240310223432847 image-20240310223432847
  • Applications with more flexible conditional masks including programmatic sketches and human scribbles. It also support composable shape-controllable generation.

    Applications Examples
    Sketches & Scribbles image-20240310223432847
    Composable Generation image-20240310224301115

🛠️ Install

Recommended: Python=3.11 torch=2.1.2 CUDA=12.2 diffusers=0.25.1

# set up repository 
git clone https://github.com/DREAMXFAR/Shape-aware-ControlNet.git
cd Shape-aware-ControlNet-master

# install conda environment 
conda env create -f environment.yaml 
conda activate shapeaware_controlnet

📌 Checkpoints

You can download the following checkpoints and put them in controlnet_checkpoint/.

Model Baidu Yun Key Notations
controlnet_lvis_mask link 2024 ControlNet trained on our LVIS_COCO datasets with segmentation masks.
controlnet_lvis_bbox link 2024 ControlNet trained on our LVIS_COCO datasets with bounding-box masks.
shape-aware controlnet link 2024 Our proposed shape-aware controlnet, which is trained on LVIS_COCO with random dilated masks.

🚀 Getting Started

Here we suppose you are in Shape-aware-ControlNet-master/

Date Preparation

  1. Download the COCO (for images) and LVIS (for instance segmentations and captions) dataset.

  2. Pre-process the raw dataset and generate new jsonl file for our LVIS-COCO dataset. Run the following scripts. Note these .py files are just for reference. You can process the raw dataset on your own.

    # generate jsonl for LVIS-COCO dataset
    cd custom_dataset/utils
    
    # configure the data path for COCO and LVIS, and the save path in prepare_lvis_coco.py.
    python prepare_lvis_coco.py
    
    # filter images with empty annotations and merge files 
    python filter_empty_anns.py  # filter empty annotations 
    sh merge.sh  # merge val_subtrain.jsonl and train.jsonl
  3. Generate dilated masks and bounding-box images for training and test.

    # generate dilated masks 
    python vis_sketch_images.py
  4. If you follow the provided scripts, the final dataset structure should be organized as follows. Note this is just used for our data-loaders including coco_offlinecond_dataset.py and lvis_coco.py under ./custom_datasets/. Remember to configure the global variables in coco_offlinecond_dataset.py and lvis_coco.py. You can write your own dataloader.

    LVIS_COCO_triplet
      - train2017  # COCO train2017 images 
      - val2017  # COCO val2017 images 
      - conditional_train
        - epsilon_0_blackbg  # raw masks 
        - lvisbbox  # bounding-box masks 
        - dilate_5_within_bbox  # dilated masks with the radius=5
        - ...
      - conditional_val
        - epsilon_0_blackbg  # raw masks 
        - lvisbbox  # bounding-box masks 
        - dilate_5_within_bbox  # dilated masks with the radius=5
        - caption_val.jsonl  # filenames and captions (the 1st caption of each image by default)
        - ...
      - categories.jsonl  # category information 
      - train.jsonl  # train information 
      - val.jsonl  # val information
    
    
    

Training

  1. Download the SD_v1.5 parameters from hugging-face and our shape-aware ControlNet checkpoints from Baidu Yun. Put the SD parameters in anywhere you like and put our checkpoints under ./checkpoints.

  2. Configure the hyper-parameters in ./train.sh and run the scripts.

    # train the shape-aware controlnet
    sh train.sh

    Notations for some arguments:

    --dataset_name: in ['custom', 'livs'], to use different dataloader.

    --conditioning_mode: the directory name for the conditional images like segmentation maps.

    --dilate_radius: set the dilate_radius for the masks, random for random dilation.

    --proportion_empty_prompts: random dropping of the prompts for CFG.

    --do_ratio_condition: enable the training of the modulated blocks.

    --do_predict: enable the training for the deteriorate-ratio predictor.

    --detach_feature: detach the gradient for deteriorate-ratio predictor.

    --predictor_lr: set a separate learning rate for the deteriorate-ratio predictor.

Inference

  1. To inference with a single image, configure the ./test_single and run the following scripts. Remember to configure the global variables in test_controlnet.py.

    # inference with shape-aware controlnet on a single conditional image 
    sh test_single.sh

    Notations for some arguments:

    --img_path: the test image path.

    --prompt: the prompt for the conditional images.

    --controlnet_path: the controlnet checkpoint path.

    --output_path: the directory path for saving the outputs.

  2. To inference with multiple images, i.e., a mask image and a bounding-box image, configure the ./test_multi.sh and run the following scripts. Remember to configure the global variables in test_multicontrolnet.py.

    # inference with shape-aware controlnet on multiple conditional images 
    # Note: this application only supports two conditional images for mask and bbox
    sh test_multi.sh

    Notations for some arguments:

    --prompt: the prompt for the conditional images.

    --mask_img_path: the image path of conditional mask.

    --mask_controlnet_path: the controlnet checkpoint path for the mask branch.

    --mask_deteriorate_ratio: the deteriorate ratio provided by users. If this value is not provided, the model will predict it on its own.

    --bbox_img_path: the image path of conditional bounding-box mask.

    --bbox_controlnet_path: the controlnet checkpoint path for the bounding-box branch.

    --bbox_deteriorate_ratio: the deteriorate ratio provided by users. If this value is not provided, the model will predict it on its own.

    --output_path: the directory path for saving the outputs.

🎨 Applications

Here we showcase several application scenarios of our shape-aware controlnet. More details please refer to our paper.

  1. Generation with TikZ sketches and human scribbles.

image-20240311004332553

image-20240311004358704

  1. Shape-prior modification of the generated images. The values above denotes the $\Delta \rho$.

image-20240311004419592

  1. Composable shape-controllable generation

image-20240311004451209

💡 FAQs

  • None

💗 Acknowledgements

✒️ Citation

If you find our findings helpful in your research, please consider giving this repository a ⭐ and citing:

@article{xuan2024controlnet,
  title={When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability},
  author={Xuan, Wenjie and Xu, Yufei and Zhao, Shanshan and Wang, Chaoyue and Liu, Juhua and Du, Bo and Tao, Dacheng},
  journal={arXiv preprint arXiv:2403.00467},
  year={2024}
}

About

It is the official implementation of Shape-aware ControlNet, accepted by ACM MM'2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published