Skip to content
View iehppp2010's full-sized avatar

Block or report iehppp2010

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TS-BSmamba2: A TWO-STAGE BAND-SPLIT MAMBA-2 NETWORK FOR MUSIC SEPARATION

Python 30 Updated Sep 16, 2024
Python 4,887 360 Updated Sep 23, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 1,884 106 Updated Sep 23, 2024

This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is read…

Python 104 11 Updated Sep 17, 2024

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Python 399 13 Updated Aug 6, 2024

🔍 🐍 Like pstack but for Python!

Python 1,008 45 Updated Sep 17, 2024

StarRail Datasets For SVC/SVS/TTS

278 14 Updated Sep 7, 2024
Jupyter Notebook 41 3 Updated Aug 13, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 33,641 4,095 Updated Aug 16, 2024

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

Python 83 6 Updated Sep 20, 2024

Robust Singing Voice Transcription and MIDI Extraction

Python 48 2 Updated Jul 29, 2024

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 848 96 Updated Sep 5, 2024

Multilingual Voice Understanding Model

Python 2,726 258 Updated Sep 2, 2024

A generative speech model for daily dialogue.

Python 30,922 3,360 Updated Sep 21, 2024

Predicts the level of noise and reverberation on your audiofiles

Jupyter Notebook 136 24 Updated May 22, 2024

Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"

Python 192 6 Updated Jul 22, 2024

Awesome speech/audio LLMs, representation learning, and codec models

598 27 Updated Sep 20, 2024

Generative models for conditional audio generation

Python 2,536 235 Updated Jul 15, 2024

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)

TypeScript 1,639 178 Updated Sep 23, 2024

natten v0.15.1

Cuda 1 Updated Mar 26, 2024

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

Cuda 345 26 Updated Aug 20, 2024

TorchCFM: a Conditional Flow Matching library

Python 1,062 83 Updated Aug 21, 2024

Diffusion-based singing voice pitch correction

Python 72 14 Updated Sep 20, 2024

INTERSPEECH2023: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music

Python 28 3 Updated May 27, 2024

Orion-14B is a family of models includes a 14B foundation LLM, and a series of models: a chat model, a long context model, a quantized model, a RAG fine-tuned model, and an Agent fine-tuned model. …

Python 782 57 Updated Jun 3, 2024

这是一个用C++实现ASR推理的项目,它依赖很少,安装也很简单,推理速度很快,在树莓派4B等ARM平台也可以流畅的运行。 支持的模型是由Google的Transformer模型中优化而来,数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时), 所以识别效果也很好,可以媲美许多商用的ASR软件。

C 482 74 Updated Mar 19, 2023

Diffusion model papers, survey, and taxonomy

2,909 247 Updated Aug 9, 2024
Python 130 11 Updated Jul 9, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 583 42 Updated Sep 9, 2024

Unofficial implementation of NVIDIA P-Flow TTS paper

Python 212 30 Updated Jul 1, 2024
Next