Highlights
- Pro
Starred repositories
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
PyTorch implementation of some attentions for Deep Learning Researchers.
IJCAI 2021 Tutorial & code for Retrospective Reader for Machine Reading Comprehension (AAAI 2021)
Spoke client-side library for audio and speech recognition
A framework for building speech-enabled websites.
This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-speech
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless Spoken Question Answering with Speech Discrete Unit Adaptiv…
ACL2020 Tutorial: Open-Domain Question Answering
Simple implementation of dynamic programming based phoneme segmentation method given in "Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks" (ht…
Speech recognition module for Python, supporting several engines and APIs, online and offline.
A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks
A Collection of BM25 Algorithms in Python
A library for efficient similarity search and clustering of dense vectors.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Unsupervised text tokenizer for Neural Network-based text generation.
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Simple one-line scripts to extract reliable MFCC features with librosa and store in HDF5 format file.
Shared repository for open-sourced projects from the Google AI Language team.
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
Toolbox of models, callbacks, and datasets for AI/ML researchers.
An unofficial implementation of the paper "One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization".
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Implementation of the paper: Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks (INTERSPEECH 2021)
Leethony / Additive-Margin-Softmax-Loss-Pytorch
Forked from cvqluu/Angular-Penalty-Softmax-Losses-PytorchAdditive margin softmax loss in pytorch