Starred repositories
An unnecessarily tiny implementation of GPT-2 in NumPy.
A curated list of awesome C/C++ performance optimization resources: talks, articles, books, libraries, tools, sites, blogs. Inspired by awesome.
A guidance language for controlling large language models.
【A common used C++ DAG framework】 一个通用的、无三方依赖的、跨平台的、收录于awesome-cpp的、基于流图的并行计算框架。欢迎star & fork & 交流
Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2.5, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
🦜🔗 Build context-aware reasoning applications
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A distributed Kafka Consumer in Python using Ray
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Hackable and optimized Transformers building blocks, supporting a composable construction.
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
A high-throughput and memory-efficient inference and serving engine for LLMs
基于向量数据库与GPT3.5的通用本地知识库方案(A universal local knowledge base solution based on vector database and GPT3.5)
This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.
Large Language Model Text Generation Inference
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Demo of live progress bar developed in SpringBoot and React.js using server-sent events.
Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
Fast Inference Solutions for BLOOM
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.