Skip to content
View zyeric's full-sized avatar
  • MSRA
  • Beijing

Block or report zyeric

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
22 stars written in C++
Clear filter

LLM inference in C/C++

C++ 65,659 9,421 Updated Sep 30, 2024

Abseil Common Libraries (C++)

C++ 14,812 2,588 Updated Sep 27, 2024

Event-driven network library for multi-threaded Linux server in C++11

C++ 14,753 5,155 Updated Aug 15, 2024

Development repository for the Triton language and compiler

C++ 12,896 1,561 Updated Sep 30, 2024

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 10,607 2,109 Updated Sep 24, 2024

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 5,874 667 Updated Sep 6, 2024

Transformer related optimization, including BERT, GPT

C++ 5,803 889 Updated Mar 27, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,439 921 Updated Sep 25, 2024

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,910 757 Updated Feb 8, 2024

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,756 538 Updated Sep 26, 2024

ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation

C++ 3,365 567 Updated Jun 21, 2019

Compiler for Neural Network hardware accelerators

C++ 3,210 689 Updated May 11, 2024

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,177 328 Updated May 16, 2023

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,582 579 Updated Sep 30, 2024

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,474 197 Updated Jun 12, 2023

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,235 186 Updated Apr 14, 2024

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 954 159 Updated Sep 19, 2024

A polyhedral compiler for expressing fast and portable data parallel algorithms

C++ 916 132 Updated Sep 17, 2024

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 800 160 Updated Aug 28, 2024

C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!

C++ 468 109 Updated Aug 22, 2024

A library of GPU kernels for sparse matrix operations.

C++ 241 50 Updated Nov 24, 2020

This is a demo how to write a high performance convolution run on apple silicon

C++ 52 2 Updated Feb 8, 2022