Stars
Event-driven network library for multi-threaded Linux server in C++11
Development repository for the Triton language and compiler
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Transformer related optimization, including BERT, GPT
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation
LightSeq: A High Performance Library for Sequence Processing and Generation
A retargetable MLIR-based machine learning compiler and runtime toolkit.
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
A polyhedral compiler for expressing fast and portable data parallel algorithms
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!
A library of GPU kernels for sparse matrix operations.
This is a demo how to write a high performance convolution run on apple silicon