Skip to content
View zhangyunming's full-sized avatar

Block or report zhangyunming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 3,769 288 Updated Sep 19, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,097 120 Updated Sep 20, 2024

High-resolution models for human tasks.

Python 3,928 202 Updated Sep 20, 2024

Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 7,485 688 Updated Sep 21, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 10,878 897 Updated Aug 21, 2024

Kolors Team

Python 3,575 230 Updated Sep 4, 2024

Code and data of We-Math

Python 120 7 Updated Jul 23, 2024

​ 李白 👤 作为唐代杰出诗人,其诗歌作品在中国文学史上具有重要地位。近年来,随着数字技术和人工智能的快速发展,传统文化普及推广的形式也面临着创新与变革。国内外对于李白诗歌的研究虽已相当深入,但在数字化、智能化普及方面仍存在不足。因此,本项目旨在通过构建李白知识图谱,结合大模型训练出专业的AI智能体,以生成式对话应用的形式,推动李白文化的普及与推广。

Python 1,142 130 Updated Sep 1, 2024

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,272 59 Updated Sep 10, 2024

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 51,112 5,375 Updated Sep 21, 2024

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Python 1,237 143 Updated Aug 28, 2024

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 5,307 544 Updated Jul 3, 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,180 1,259 Updated Sep 14, 2024

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

Python 2,185 274 Updated Jun 29, 2024

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 46,055 6,495 Updated Sep 20, 2024

A generative speech model for daily dialogue.

Python 30,844 3,349 Updated Sep 4, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 4,657 371 Updated Sep 11, 2024

Official PyTorch implementation of ECCV 2024 Paper: ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback.

Python 377 16 Updated Sep 2, 2024

YOLOv10: Real-Time End-to-End Object Detection

Python 9,420 888 Updated Aug 8, 2024

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Python 959 146 Updated Jul 18, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 1,986 131 Updated Sep 3, 2024

[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

Python 921 37 Updated Sep 19, 2024

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 3,294 282 Updated Aug 15, 2024

More relighting!

Python 4,895 331 Updated Jun 27, 2024
Python 2,446 174 Updated Sep 19, 2024

Mixture-of-Experts for Large Vision-Language Models

Python 1,920 121 Updated May 15, 2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 2,858 206 Updated Jul 27, 2024

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,512 567 Updated Jul 2, 2024

Official implementation of FaceXFormer: A Unified Transformer for Facial Analysis

Python 183 18 Updated Apr 4, 2024

CAMixerSR: Only Details Need More “Attention” (CVPR 2024)

Python 209 11 Updated Jun 4, 2024
Next