-
SCUT
- Guangzhou
-
17:22
(UTC +08:00)
Highlights
- Pro
Stars
The reproduce training process for Moshi
华南理工大学硕博士学位论文模板(LaTeX)。Latex templates for the thesis of South China University of Technology
zero-shot voice conversion & singing voice conversion with in context learning
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
互联网仍有记忆!那些曾经在校招过程中毁过口头offer、意向书、三方的公司!纵然人微言轻,也想尽绵薄之力!
The official Implementation of PeriodWave and PeriodWave-Turbo
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Vector (and Scalar) Quantization, in Pytorch
A multi-voice TTS system trained with an emphasis on quality
Official implement of SpeechFormer written in Python (PyTorch).
Fast and memory-efficient exact attention
Audio generation using diffusion models, in PyTorch.
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
Code for the paper Hybrid Spectrogram and Waveform Source Separation
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A JAX based package designed for efficient second order operators (e.g., laplacian) computation.
Unofficial implementation of NaturalSpeech2 for Voice Conversion and Text to Speech
A Compact and Effective Pretrained Model for Speech Emotion Recognition
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.