zhangjun

Zhang Jun zhangjun

22 followers · 362 following

Beijing
03:46 (UTC +08:00)
http://zhangjun.github.io

Achievements

x2 x3

Achievements

x2 x3

Highlights

Developer Program Member

Starred repositories

npuichigo / openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

Rust 150 25 Updated Aug 1, 2024

NVIDIA / k8s-nim-operator

An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.

Go 22 7 Updated Sep 20, 2024

mjpieters / aiolimiter

An efficient implementation of a rate limiter for asyncio.

Python 490 22 Updated Sep 19, 2024

lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 2,909 222 Updated Aug 10, 2024

google / flax

Flax is a neural network library for JAX that is designed for flexibility.

Python 5,961 631 Updated Sep 21, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 2,974 152 Updated Sep 20, 2024

intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

Python 206 19 Updated Sep 20, 2024

fishaudio / fish-speech

Brand new TTS solution

Python 12,170 923 Updated Sep 20, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 234 32 Updated Sep 20, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 13,905 985 Updated Sep 13, 2024

aredden / flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Python 111 12 Updated Sep 13, 2024

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 1,125 105 Updated Sep 20, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,549 244 Updated Sep 14, 2024