- Chengdu
-
11:21
(UTC +08:00)
Stars
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
修复funasr中seaco-paraformer导出onnx后没有时间戳的bug
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
real time face swap and one-click video deepfake with only a single image
Whisper realtime streaming for long speech-to-text transcription and translation
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
We Speech Transcript based on LLM, in 300 lines of code.
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
Prediction of sound event bounding boxes (SEBBs)
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
SpeechGPT Series: Speech Large Language Models
⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Official repository of SepReformer for speech separation
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Contains the code associated with the ICLR submission for our text-to-speech diffusion model
Prosody and Pronunciation Modification Network
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
jingyonghou / SenseVoice
Forked from FunAudioLLM/SenseVoiceMultilingual Voice Understanding Model