-
HUST
- Wuhan, China
Stars
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 20…
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
MINT-1T: A one trillion token multimodal interleaved dataset.
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Curated tutorials and resources for Large Language Models, AI Painting, and more.
✨✨Latest Advances on Multimodal Large Language Models
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Ongoing research training transformer models at scale
An open-source framework for training large multimodal models.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
🦜🔗 Build context-aware reasoning applications
A toolbox of ocr models and algorithms based on MindSpore
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
CDLA: A Chinese document layout analysis (CDLA) dataset
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Code release for ConvNeXt V2 model
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
The official implementation of the NeurIPS 2022 paper Q-ViT.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
Code used to generate synthetic scenes and bounding box annotations for object detection. This was used to generate data used in the Cut, Paste and Learn paper