Stars
TS-BSmamba2: A TWO-STAGE BAND-SPLIT MAMBA-2 NETWORK FOR MUSIC SEPARATION
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is read…
Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
Robust Singing Voice Transcription and MIDI Extraction
Official PyTorch implementation of BigVGAN (ICLR 2023)
Multilingual Voice Understanding Model
A generative speech model for daily dialogue.
Predicts the level of noise and reverberation on your audiofiles
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
Awesome speech/audio LLMs, representation learning, and codec models
Generative models for conditional audio generation
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
TorchCFM: a Conditional Flow Matching library
Diffusion-based singing voice pitch correction
INTERSPEECH2023: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music
Orion-14B is a family of models includes a 14B foundation LLM, and a series of models: a chat model, a long context model, a quantized model, a RAG fine-tuned model, and an Agent fine-tuned model. …
这是一个用C++实现ASR推理的项目,它依赖很少,安装也很简单,推理速度很快,在树莓派4B等ARM平台也可以流畅的运行。 支持的模型是由Google的Transformer模型中优化而来,数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时), 所以识别效果也很好,可以媲美许多商用的ASR软件。
Diffusion model papers, survey, and taxonomy
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Unofficial implementation of NVIDIA P-Flow TTS paper