-
Aalto University
- Helsinki, Finland
- https://orcid.org/0000-0002-2201-103X
Highlights
- Pro
Stars
an architecture for neural network inference in real-time audio applications
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Differentiable augmentation and robustness evaluation for audio
Official Repository of Unsupervised Lead Sheet Generation via Semantic Compression
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
An unofficial PyTorch implementation of the audio LM VALL-E
Generative models for conditional audio generation
Flops counter for convolutional networks in pytorch framework
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
An autoregressive character-level language model for making more things
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Foundational Models for State-of-the-Art Speech and Text Translation
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Companion code for the Modern Real-Time Audio Programming course.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained …
Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch