Skip to content
View shushenglee's full-sized avatar

Block or report shushenglee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,079 120 Updated Sep 24, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,400 133 Updated Sep 24, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,634 439 Updated Sep 19, 2024

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

C++ 30,923 7,838 Updated Aug 3, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,129 944 Updated Sep 30, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,325 84 Updated Sep 23, 2024

The model, data and code for the visual GUI Agent SeeClick

HTML 187 10 Updated Aug 27, 2024

MINT-1T: A one trillion token multimodal interleaved dataset.

737 20 Updated Jul 31, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,025 190 Updated Apr 24, 2024

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 8,585 535 Updated Sep 26, 2024

《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程

Jupyter Notebook 8,227 981 Updated Sep 29, 2024

UniMD: Towards Unifying Moment retrieval and temporal action Detection

Python 34 1 Updated Jul 5, 2024

Large-scale text-video dataset. 10 million captioned short videos.

Python 579 35 Updated Aug 14, 2024

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Python 784 46 Updated Aug 21, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,321 69 Updated Aug 21, 2024

Event camera (DVS, Spike) based Papers Published on Top International Conference

55 3 Updated Sep 28, 2024

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 137 Updated Sep 23, 2024

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 14,865 1,377 Updated Sep 5, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 4,744 386 Updated Sep 26, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python 252 5 Updated Aug 29, 2024
Python 8,330 485 Updated Jan 27, 2024

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Python 29,046 5,736 Updated Sep 30, 2024

This repository contains code for object detection and tracking in videos using the YOLOv10 object detection model and the DeepSORT algorithm.

Python 111 17 Updated Jul 30, 2024

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]

Python 9,570 911 Updated Sep 26, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,118 848 Updated Sep 13, 2024

AlwaysReddy is a LLM voice assistant that is always just a hotkey away.

Python 617 61 Updated Sep 12, 2024

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 46,933 5,554 Updated Sep 18, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 19,546 2,153 Updated Aug 12, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,467 143 Updated Sep 25, 2024
Next