-
Kits.AI
- Bishkek, Kyrgyzstan
- in/amanteur
Lists (1)
Sort Name ascending (A-Z)
Stars
Multitrack music mixing style transfer given a reference song using differentiable mixing console.
TS-BSmamba2: A TWO-STAGE BAND-SPLIT MAMBA-2 NETWORK FOR MUSIC SEPARATION
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
Official PyTorch implementation of BigVGAN (ICLR 2023)
MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.
It's a repository for implementations of neural speech editing algorithms.
Text-to-Music Generation with Rectified Flow Transformers
This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)
🎚️ Open Source Audio Matching and Mastering
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
An extremely fast Python linter and code formatter, written in Rust.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
zero-shot voice conversion & singing voice conversion with in context learning
Source code and demo for INTERSPEECH 2024 paper: Noise-robust Speech Separation with Fast Generative Correction
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz
Official Implementation of Interspeech 2024 Paper "Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement"
Fine-tune Stable Audio Open with DiT ControlNet.
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Boosting Self-Supervised Embeddings for Speech Enhancement