- Chengdu
-
13:24
(UTC +08:00)
Stars
🔊 Text-Prompted Generative Audio Model
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
A multi-voice TTS system trained with an emphasis on quality
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supportin…
Foundational Models for State-of-the-Art Speech and Text Translation
PyTorch code and models for the DINOv2 self-supervised learning method.
A series of large language models trained from scratch by developers @01-ai
Zero-Shot Speech Editing and Text-to-Speech in the Wild
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Official Code for Stable Cascade
An Open Source text-to-speech system built by inverting Whisper.
CoTracker is a model for tracking any point (pixel) on a video.
MARS5 speech model (TTS) from CAMB.AI
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.
Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
HF's ML for Audio study group
In this repository, you will learn how code works in VITS(Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) in Jupyter Notebooks, including normalizing da…
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Predicts the level of noise and reverberation on your audiofiles
An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"
Pretraining transformer based Thai language models
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
BotSIM - a data-efficient end-to-end Bot SIMulation toolkit for evaluation, diagnosis, and improvement of commercial chatbots