Skip to content
View chenht2021's full-sized avatar
  • Chengdu
  • 11:21 (UTC +08:00)

Block or report chenht2021

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,596 245 Updated Sep 14, 2024

修复funasr中seaco-paraformer导出onnx后没有时间戳的bug

Python 13 4 Updated Sep 12, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 564 22 Updated Sep 17, 2024
Python 45 7 Updated Sep 3, 2024

real time face swap and one-click video deepfake with only a single image

Python 36,126 5,127 Updated Sep 23, 2024

使用Android原生开发的电视直播软件

Kotlin 4,838 498 Updated Sep 13, 2024

Whisper realtime streaming for long speech-to-text transcription and translation

Python 1,796 221 Updated Sep 1, 2024

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

Python 83 6 Updated Sep 20, 2024

We Speech Transcript based on LLM, in 300 lines of code.

Python 117 11 Updated Aug 16, 2024

DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability

Python 80 6 Updated Jul 10, 2024

Whisper with Medusa heads

Python 777 47 Updated Sep 23, 2024

Prediction of sound event bounding boxes (SEBBs)

Python 20 2 Updated Aug 2, 2024

LlamaVoice is a llama-based large voice generation model, providing inference and training ability.

Python 170 10 Updated Aug 26, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,227 81 Updated Jul 22, 2024

VALL-E 2 reproduction

Jupyter Notebook 72 11 Updated Jul 14, 2024

⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)

Python 1,140 36 Updated Jun 7, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Python 181 6 Updated Sep 3, 2024

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,165 148 Updated Sep 8, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 937 49 Updated Sep 3, 2024
Python 42 2 Updated Feb 8, 2024

Official repository of SepReformer for speech separation

Python 72 7 Updated Jun 24, 2024

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python 239 21 Updated Sep 11, 2024

Contains the code associated with the ICLR submission for our text-to-speech diffusion model

Python 51 1 Updated Oct 31, 2023

Prosody and Pronunciation Modification Network

Python 38 6 Updated Aug 8, 2024

Bring portraits to life!

Python 11,896 1,245 Updated Sep 6, 2024

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

Python 409 42 Updated Sep 20, 2024

Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

HTML 28 1 Updated Aug 21, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 4,996 507 Updated Sep 23, 2024

Multilingual Voice Understanding Model

Python 1 Updated Jul 5, 2024
Next