zhangyunming

huhu zhangyunming

13 followers · 8 following

Shenzhen

Stars

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 3,769 288 Updated Sep 19, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,097 120 Updated Sep 20, 2024

facebookresearch / sapiens

High-resolution models for human tasks.

Python 3,928 202 Updated Sep 20, 2024

THUDM / CogVideo

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,485 688 Updated Sep 21, 2024

facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 10,878 897 Updated Aug 21, 2024

Kwai-Kolors / Kolors

Kolors Team

Python 3,575 230 Updated Sep 4, 2024

We-Math / We-Math

Code and data of We-Math

Python 120 7 Updated Jul 23, 2024

BinNong / meet-libai

李白 👤 作为唐代杰出诗人，其诗歌作品在中国文学史上具有重要地位。近年来，随着数字技术和人工智能的快速发展，传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入，但在数字化、智能化普及方面仍存在不足。因此，本项目旨在通过构建李白知识图谱，结合大模型训练出专业的AI智能体，以生成式对话应用的形式，推动李白文化的普及与推广。

Python 1,142 130 Updated Sep 1, 2024

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,272 59 Updated Sep 10, 2024

comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 51,112 5,375 Updated Sep 21, 2024

ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Python 1,237 143 Updated Aug 28, 2024

rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 5,307 544 Updated Jul 3, 2024

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,180 1,259 Updated Sep 14, 2024

tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

Python 2,185 274 Updated Jun 29, 2024

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 46,055 6,495 Updated Sep 20, 2024