-
University of Chinese Academy of Sciences
Highlights
- Pro
Stars
Collection of AWESOME vision-language models for vision tasks
iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
OLMoE: Open Mixture-of-Experts Language Models
Code for our paper "Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ECCV 2022]Code for paper "DaViT: Dual Attention Vision Transformer"
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
PyTorch code and models for the DINOv2 self-supervised learning method.
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
PyTorch implementation of multi-task learning architectures, incl. MTI-Net (ECCV2020).
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
llama3 implementation one matrix multiplication at a time
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
Official repository for "AM-RADIO: Reduce All Domains Into One"
🌍 Discover our global repository of countries, states, and cities! 🏙️ Get comprehensive data in JSON, SQL, PSQL, XML, YAML, and CSV formats. Access ISO2, ISO3 codes, country code, capital, native l…
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life