shushenglee

Lossve Kevin shushenglee

Keep moving

Starred repositories

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,079 120 Updated Sep 24, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,400 133 Updated Sep 24, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,634 439 Updated Sep 19, 2024

CMU-Perceptual-Computing-Lab / openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

C++ 30,923 7,838 Updated Aug 3, 2024

facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,129 944 Updated Sep 30, 2024

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,325 84 Updated Sep 23, 2024

njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick

HTML 187 10 Updated Aug 27, 2024

mlfoundations / MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.

737 20 Updated Jul 31, 2024

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,025 190 Updated Apr 24, 2024

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 8,585 535 Updated Sep 26, 2024

datawhalechina / self-llm

《开源大模型食用指南》基于Linux环境快速部署开源大模型，更适合中国宝宝的部署教程

Jupyter Notebook 8,227 981 Updated Sep 29, 2024

yingsen1 / UniMD

UniMD: Towards Unifying Moment retrieval and temporal action Detection

Python 34 1 Updated Jul 5, 2024

m-bain / webvid

Large-scale text-video dataset. 10 million captioned short videos.

Python 579 35 Updated Aug 14, 2024

microsoft / Samba

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Python 784 46 Updated Aug 21, 2024

yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,321 69 Updated Aug 21, 2024

Event-AHU / Event_Camera_in_Top_Conference

Event camera (DVS, Spike) based Papers Published on Top International Conference

55 3 Updated Sep 28, 2024

JUNJIE99 / MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 137 Updated Sep 23, 2024

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 14,865 1,377 Updated Sep 5, 2024

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 4,744 386 Updated Sep 26, 2024

OpenGVLab / OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python 252 5 Updated Aug 29, 2024

apple / ml-ferret

Python 8,330 485 Updated Jan 27, 2024

ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Python 29,046 5,736 Updated Sep 30, 2024

sujanshresstha / YOLOv10_DeepSORT

This repository contains code for object detection and tracking in videos using the YOLOv10 object detection model and the DeepSORT algorithm.

Python 111 17 Updated Jul 30, 2024

THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]

Python 9,570 911 Updated Sep 26, 2024

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,118 848 Updated Sep 13, 2024

ILikeAI / AlwaysReddy

AlwaysReddy is a LLM voice assistant that is always just a hotkey away.

Python 617 61 Updated Sep 12, 2024

fundamentalvision / Uni-Perceiver

Python 266 21 Updated Jan 12, 2023

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 46,933 5,554 Updated Sep 18, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 19,546 2,153 Updated Aug 12, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,467 143 Updated Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly