Stars
Distributed training (multi-node) of a Transformer model
Attention is all you need implementation
PyTorch native quantization and sparsity for training and inference
how to optimize some algorithm in cuda.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
The official GitHub page for the survey paper "A Survey of Large Language Models".
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Research and Materials on Hardware implementation of Transformer Model
Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.
My C++ deep learning framework & other machine learning algorithms
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
Understanding Deep Learning - Simon J.D. Prince
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
Examples for using ONNX Runtime for machine learning inferencing.
This is a list of interesting papers and projects about TinyML.
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
Development repository for the Triton language and compiler