-
Kits.AI
- Bishkek, Kyrgyzstan
- in/amanteur
Lists (1)
Sort Name ascending (A-Z)
Stars
🦜🔗 Build context-aware reasoning applications
🔊 Text-Prompted Generative Audio Model
Google Research
Foundational Models for State-of-the-Art Speech and Text Translation
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
An Open Source text-to-speech system built by inverting Whisper.
A simple notebook demonstrating prompt-based music generation via Mubert API
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Theory of digital signal processing (DSP): signals, filtration (IIR, FIR, CIC, MAF), transforms (FFT, DFT, Hilbert, Z-transform) etc.
Utility functions for handling MIDI data in a nice/intuitive way.
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
A series of tutorial notebooks on denoising diffusion probabilistic models in PyTorch
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Official implementation of the paper "The Stable Signature Rooting Watermarks in Latent Diffusion Models"
Create automatic playlists by using Deep Learning to *listen* to the music.
Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)
YSDA course in Speech Processing.
The Harmonix Set: Beats, Downbeats, and Structural Annotations for Pop Music
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"