kevin-14

kevin__liu kevin-14

1 follower · 30 following

Stars

AlibabaPAI / FLASHNN

Python 74 7 Updated Sep 9, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,198 110 Updated Sep 30, 2024

ELS-RD / kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,525 94 Updated Feb 16, 2024

BobMcDear / attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 461 21 Updated Sep 30, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,375 192 Updated Sep 30, 2024

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,076 324 Updated May 27, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,475 122 Updated Sep 29, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,224 153 Updated Jun 25, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,631 4,073 Updated Sep 30, 2024

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 132,787 26,459 Updated Sep 30, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,568 173 Updated Sep 27, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 4,314 388 Updated Sep 28, 2024

OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Python 4,371 451 Updated Aug 6, 2024

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 10,628 1,533 Updated Sep 29, 2024

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,106 797 Updated Aug 20, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 65,661 9,421 Updated Sep 30, 2024

ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

Python 1,645 149 Updated Oct 25, 2023

dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

C++ 26,138 8,706 Updated Sep 30, 2024

ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++

C 34,712 3,533 Updated Sep 27, 2024

microsoft / Megatron-DeepSpeed

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,850 343 Updated Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kevin__liu kevin-14

Block or report kevin-14

Stars

AlibabaPAI / FLASHNN

flashinfer-ai / flashinfer

ELS-RD / kernl

BobMcDear / attorch

ModelTC / lightllm

BBuf / tvm_mlir_learn

BBuf / how-to-optim-algorithm-in-cuda

FasterDecoding / Medusa

vllm-project / vllm

huggingface / transformers

DefTruth / Awesome-LLM-Inference

InternLM / lmdeploy

OFA-Sys / Chinese-CLIP

chenzomi12 / AISystem

RUCAIBox / LLMSurvey

ggerganov / llama.cpp

ELS-RD / transformer-deploy

dmlc / xgboost

ggerganov / whisper.cpp

microsoft / Megatron-DeepSpeed

facebookincubator / gloo

lencx / ChatGPT

microsoft / DeepSpeedExamples

huggingface / peft

NVIDIA / FasterTransformer

Dao-AILab / flash-attention

triton-inference-server / server

mlc-ai / tokenizers-cpp

microsoft / onnxruntime

mlc-ai / mlc-llm