Stars
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
A feature-rich command-line audio/video downloader
For GGUF support, see KoboldCPP: https://github.com/LostRuins/koboldcpp
Ultimate Vocal Remover for Google Colab
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
A wholistic rss namespace for podcasting
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
High-Resolution Image Synthesis with Latent Diffusion Models
A smaller subset of 10 easily classified classes from Imagenet, and a little more French
A latent text-to-image diffusion model
A high-throughput and memory-efficient inference and serving engine for LLMs
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.
Open source real-time translation app for Android that runs locally
ASR + diarization model server with speculative decoding
YaRN: Efficient Context Window Extension of Large Language Models
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.