Skip to content
View ThomasLWang's full-sized avatar

Block or report ThomasLWang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 11,491 1,208 Updated Aug 21, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 5,954 756 Updated Sep 11, 2024

A feature-rich command-line audio/video downloader

Python 83,037 6,474 Updated Sep 17, 2024

For GGUF support, see KoboldCPP: https://github.com/LostRuins/koboldcpp

Python 3,470 747 Updated Aug 21, 2024

Ultimate Vocal Remover for Google Colab

Python 36 11 Updated Apr 11, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 32,888 3,782 Updated Sep 17, 2024

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Jupyter Notebook 624 80 Updated Sep 2, 2024
Python 115 4 Updated Sep 20, 2024

Audio Editor

C 12,266 2,246 Updated Sep 20, 2024

A wholistic rss namespace for podcasting

HTML 380 114 Updated Sep 20, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 4,472 384 Updated Sep 6, 2024

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 11,519 1,504 Updated Feb 29, 2024

A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Jupyter Notebook 959 73 Updated Sep 26, 2022

A latent text-to-image diffusion model

Jupyter Notebook 67,608 10,086 Updated Jun 18, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,148 3,986 Updated Sep 22, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,031 535 Updated May 31, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 4,925 499 Updated Sep 19, 2024

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Python 1,067 214 Updated Dec 8, 2022

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 3,037 322 Updated Sep 20, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,653 2,101 Updated Jul 18, 2024
Python 274 35 Updated Sep 3, 2024

Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.

Python 444 36 Updated May 23, 2024

Open source real-time translation app for Android that runs locally

C++ 6,481 487 Updated Sep 22, 2024

ASR + diarization model server with speculative decoding

Python 47 8 Updated May 22, 2024

YaRN: Efficient Context Window Extension of Large Language Models

Python 1,312 115 Updated Apr 17, 2024

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Python 31,171 3,841 Updated Sep 19, 2024

Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b MoE model based on llama3.

Python 24 3 Updated Jul 1, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 37,347 3,928 Updated Jul 28, 2024

Official repository for ORPO

Python 411 36 Updated May 31, 2024
Next