Stars
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Examples and guides for using the OpenAI API
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation me…
LAVIS - A One-stop Library for Language-Vision Intelligence
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
GPT4V-level open-source multi-modal model based on Llama3-8B
✨✨Latest Advances on Multimodal Large Language Models
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
A latent text-to-image diffusion model
High-Resolution Image Synthesis with Latent Diffusion Models
[CSUR] A Survey on Video Diffusion Models
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
assistant tools for attention visualization in deep learning
Implementation of popular deep learning networks with TensorRT network definition API
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
主要存储Datawhale组队学习中“数据挖掘/机器学习”方向的资料。
2019 农业银行雅典娜杯数据挖掘大赛高校 Top2 Solution