Text + Sketch Compression via Text-to-Image Models

Implementation for Text + Sketch: Image Compression at Ultra Low Rates.

The following scripts will loop through a dataset, and then output the results (reconstructed images, sketches, and captions) into the recon_examples/ folder.

eval_PIC.py: uses prompt inversion to transmit a prompt and generate reconstructions.
eval_PICS.py: uses prompt inversion + sketch to transmit a compressed sketch and prompt.

For example, python eval_PICS.py --data_root DATA_ROOT will run PICS, where the images are contained in the DATA_ROOT folder. See scripts/PICS.sh for example usage. Prior to running this script, you will need to either (a) train the NTC sketch model or (b) download the pre-trained ones into the models_ntc/ folder. Instructions for both can be found below.

The annotator directory is taken from the ControlNet repo, and the prompt_inversion directory is based off of the Hard Prompts Made Easy repo.

Dataloaders

The dataloading assumes pytorch ImageFolder layouts inside DATA_ROOT. See dataloaders.py for more details.

Sketch NTC Models

A training script is provided in train_compressai.py, which is slightly modified from CompressAI's example training script. See scripts/train_sketch.sh example usage. To generate sketch training data, apply one of the filters in annotator/ to training images, and structure folder to fit the CompressAI ImageFolder.

Pre-trained NTC models for HED sketches, as well as HED sketches generated from CLIC2021 used to train it, can be found here. To download them onto a remote server, run

wget https://upenn.box.com/shared/static/g1fzf9ctn0qvdn9exjpp8mkqh7aja4gm -O trained_ntc_models.zip
wget https://upenn.box.com/shared/static/b90504o4k4onkicm8aal8fxkhltp2rnb -O HED_training_data.zip

Dependencies

pytorch
compressai
diffusers
pytorch-lightning
opencv-python
einops
ftfy
sentence-transformers
accelerate
xformers
basicsr

Notes

Since ControlNet was trained on uncompressed HED maps (the sketch), and not the decompressed ones, if the rate is set too low for the sketch, this can cause poor reconstructions for many image types.
In general, the Text + Sketch is better at reconstructing landscape photos compared to photos of objects. The performance is highly dependent on the pre-trained ControlNet model used (here we use SD), but any improved ControlNet model released in the future can be easily integrated into the Text + Sketch setup
Fine-tuning the models are currently in-progress

Citation

@inproceedings{lei2023text+sketch,
  title={Text+ Sketch: Image Compression at Ultra Low Rates},
  author={Lei, Eric and Uslu, Yi\u{g}it Berkay and Hassani, Hamed and Bidokhti, Shirin Saeedi},
  booktitle={ICML 2023 Workshop on Neural Compression: From Information Theory to Applications},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
annotator		annotator
models_ntc		models_ntc
plots		plots
prompt_inversion		prompt_inversion
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
dataloaders.py		dataloaders.py
eval_PIC.py		eval_PIC.py
eval_PICS.py		eval_PICS.py
generate_sketches.py		generate_sketches.py
models_compressai.py		models_compressai.py
resize_images.ipynb		resize_images.ipynb
train_compressai.py		train_compressai.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text + Sketch Compression via Text-to-Image Models

Dataloaders

Sketch NTC Models

Dependencies

Notes

Citation

About

Releases

Packages

Languages

leieric/Text-Sketch

Folders and files

Latest commit

History

Repository files navigation

Text + Sketch Compression via Text-to-Image Models

Dataloaders

Sketch NTC Models

Dependencies

Notes

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages