lihuahua123

Follow

🎯

Focusing

lilili lihuahua123

🎯

Focusing

Follow

9 followers · 7 following

Achievements

Achievements

Lists (2)

Sort

🔮 Future ideas

workflow

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

zhengzangw / Sequence-Scheduling

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".

Python 71 15 Updated May 23, 2023

fatedier / frp

A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.

Go 84,599 13,170 Updated Sep 6, 2024

Santosh-Gupta / SpeedTorch

Library for faster pinned CPU <-> GPU transfer in Pytorch

Python 682 39 Updated Feb 21, 2020

YaoJiayi / CacheBlend

Python 30 3 Updated Sep 17, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 485 17 Updated Sep 21, 2024

WukLab / preble

Stateful LLM Serving

Python 25 3 Updated Jul 28, 2024

yale-sys / prompt-cache

Modular and structured prompt caching for low-latency LLM inference

Python 44 3 Updated May 12, 2024

LLMServe / dLoRA-artifact

Jupyter Notebook 12 2 Updated May 28, 2024

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 237 23 Updated Aug 24, 2024

github / open-source-survey

The Open Source Survey

515 74 Updated Dec 24, 2020

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 5,199 369 Updated Sep 21, 2024

MasterAI-EAM / Darwin

An open-source project dedicated to build foundational large language model for natural science, mainly in physics, chemistry and material science.

Jupyter Notebook 184 23 Updated Aug 2, 2024

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,093 138 Updated Sep 20, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,706 88 Updated Jan 21, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,048 22 Updated Jul 31, 2024

punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Python 951 44 Updated May 8, 2024

krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities

C++ 399 112 Updated May 20, 2024

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 118 10 Updated Sep 21, 2024

Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware

Go 663 153 Updated Sep 20, 2024

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 276 32 Updated Sep 11, 2024

feifeibear / LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 72 3 Updated Mar 13, 2024

WukLab / InferCept

Jupyter Notebook 15 1 Updated Jun 22, 2024

alibaba / higress

🤖 AI Gateway | AI Native API Gateway

Go 2,860 470 Updated Sep 20, 2024

InternLM / AcmeTrace

Jupyter Notebook 121 7 Updated Mar 12, 2024

bentoml / BentoML

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Python 6,992 779 Updated Sep 20, 2024

OpenPPL / ppl.llm.serving

C++ 123 13 Updated May 31, 2024

chenhongyu2048 / LLM-inference-optimization-paper

Summary of some awesome work for optimizing LLM inference

27 1 Updated Sep 18, 2024

tyler-griggs / melange-release

Python 35 3 Updated Jun 27, 2024

flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 1,652 223 Updated Sep 21, 2024

cli99 / llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 337 40 Updated May 28, 2024