Stars
A latent text-to-image diffusion model
High-Resolution Image Synthesis with Latent Diffusion Models
LAVIS - A One-stop Library for Language-Vision Intelligence
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
PyTorch code and models for the DINOv2 self-supervised learning method.
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Official Code for Stable Cascade
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Kandinsky 2 — multilingual text2image latent diffusion model
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
A prompting enhancement library for transformers-type text embedding systems
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
TryOnDiffusion: A Tale of Two UNets Implementation