-
Stony Brook University
- NY
- www3.cs.stonybrook.edu/~kkahatapitiy/
- @kkahatapitiy
Stars
Official inference repo for FLUX.1 models
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Latte: Latent Diffusion Transformer for Video Generation.
[ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
Official repo for AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
VideoSys: An easy and efficient system for video generation
GIF encoder based on libimagequant (pngquant). Squeezes maximum possible quality from the awful GIF format.
[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Open-Sora: Democratizing Efficient Video Production for All
Lumina-T2X is a unified framework for Text to Any Modality Generation
Stable Diffusion web UI
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodiment
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
Code for our WACV 2021 paper "Exploiting the Redundancy in Convolutional Filters for Parameter Reduction"
Code for our AAAI 2023 paper "Weakly-guided Self-supervised Pretraining for Temporal Activity Detection"