GPU Sharing Scheduler Extender in Kuberntes

Overview

More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very insterested in this topic.

Now there is a GPU sharing solution on native Kubernetes you can take. it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easiliy in your own Kubernetes.

Prerequisites

Kubernetes 1.11+
golang 1.10+
NVIDIA drivers ~= 361.93
Nvidia-docker version > 2.0 (see how to install and it's prerequisites)
Docker configured with nvidia as the default runtime.

Design

For more details about the design of this project, please read the Design.

Setup

You can follow the Installation Guide.

User Guide

You can check the User Guide to know how to use it.

Developing

Scheduler Extender

# git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git && cd gpushare-scheduler-extender
# docker build -t cheyang/gpushare-scheduler-extender .

Device Plugin

# git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git && cd gpushare-device-plugin
# docker build -t cheyang/gpushare-device-plugin .

Kubectl Extension

golang > 1.10

# mkdir -p $GOPATH/src/github.com/AliyunContainerService
# cd $GOPATH/src/github.com/AliyunContainerService
# git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git
# cd gpushare-device-plugin
# go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go

Demo

- Demo 1: Deploy multiple GPU Shared Pods, and they are scheduled to the same GPU device in binpack way

- Demo 2: Avoid GPU Memory requests can fit the node level, but not for the GPU device level

Related Project

gpushare device plugin

Roadmap

Integrate Nvidia MPS as the option for isolation
Automated Deployment for the Kubernetes cluster which is deployed by kubeadm
Scheduler Extener High Availablity
Generic Solution for GPU, RDMA and other devices

Acknowledgments

GPU sharing solution is based on Nvidia Docker2, and their gpu sharing design is our reference. The Nvidia Community is very supportive and We are very grateful.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cmd		cmd
config		config
docs		docs
pkg		pkg
samples		samples
vendor		vendor
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Gopkg.lock		Gopkg.lock
Gopkg.toml		Gopkg.toml
LICENSE		LICENSE
README.md		README.md
demo1.jpg		demo1.jpg
demo2.jpg		demo2.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Sharing Scheduler Extender in Kuberntes

Overview

Prerequisites

Design

Setup

User Guide

Developing

Scheduler Extender

Device Plugin

Kubectl Extension

Demo

- Demo 1: Deploy multiple GPU Shared Pods, and they are scheduled to the same GPU device in binpack way

- Demo 2: Avoid GPU Memory requests can fit the node level, but not for the GPU device level

Related Project

Roadmap

Acknowledgments

About

Releases

Packages

Languages

License

rafmonteiro/gpushare-scheduler-extender

Folders and files

Latest commit

History

Repository files navigation

GPU Sharing Scheduler Extender in Kuberntes

Overview

Prerequisites

Design

Setup

User Guide

Developing

Scheduler Extender

Device Plugin

Kubectl Extension

Demo

- Demo 1: Deploy multiple GPU Shared Pods, and they are scheduled to the same GPU device in binpack way

- Demo 2: Avoid GPU Memory requests can fit the node level, but not for the GPU device level

Related Project

Roadmap

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages