Skip to content
View zhangjun's full-sized avatar

Block or report zhangjun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

OpenAI compatible API for TensorRT LLM triton backend

Rust 150 25 Updated Aug 1, 2024

An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.

Go 22 7 Updated Sep 20, 2024

An efficient implementation of a rate limiter for asyncio.

Python 490 22 Updated Sep 19, 2024

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 2,909 222 Updated Aug 10, 2024

Flax is a neural network library for JAX that is designed for flexibility.

Python 5,961 631 Updated Sep 21, 2024

Efficient Triton Kernels for LLM Training

Python 2,974 152 Updated Sep 20, 2024

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

Python 206 19 Updated Sep 20, 2024

Brand new TTS solution

Python 12,170 923 Updated Sep 20, 2024

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 234 32 Updated Sep 20, 2024

Official inference repo for FLUX.1 models

Python 13,905 985 Updated Sep 13, 2024

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Python 111 12 Updated Sep 13, 2024

Minimalistic large language model 3D-parallelism training

Python 1,125 105 Updated Sep 20, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,549 244 Updated Sep 14, 2024

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 2,451 288 Updated Aug 15, 2024

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Python 240 23 Updated Sep 3, 2024

vLLM adapter for a TGIS-compatible gRPC server.

Python 8 10 Updated Sep 21, 2024

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Python 1,181 87 Updated Aug 22, 2024

Large Language Model Text Generation Inference

Python 8,799 1,026 Updated Sep 20, 2024

NVIDIA Linux open GPU with P2P support

C 861 75 Updated Jun 7, 2024

Routing proxy for TGIS

Rust 2 4 Updated Sep 20, 2024

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

154 5 Updated Sep 18, 2024

😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

Python 914 190 Updated Feb 12, 2024

Microsoft Collective Communication Library

43 7 Updated May 8, 2024

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Python 4,647 463 Updated Sep 9, 2024

A native PyTorch Library for large model training

Python 2,085 157 Updated Sep 19, 2024

Thunder Research Group's Collective Communication Library

C++ 20 3 Updated Apr 25, 2024

NCCL Tests

Cuda 827 230 Updated Jul 30, 2024

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

22 7 Updated Aug 28, 2023

LLM prompts, llama3 prompts, llama2 prompts

152 13 Updated Aug 6, 2024
Next