Skip to content

Commit

Permalink
Rename reserved cells to pinned cells (microsoft#14)
Browse files Browse the repository at this point in the history
* rename reserved cells to pinned cells

* rename acquired to reserved

* rename reserved to pinned in yaml files and docs

* update feature demo for pinned cells

* fix typo

* fix value receiver of deleteAllocatedAffinityGroup

* fix typo in readme

* fix ambiguous naming about reserved cells

* BeingReserved -> Reserved
  • Loading branch information
zhypku committed Apr 27, 2020
1 parent fad9012 commit cb9c73d
Show file tree
Hide file tree
Showing 21 changed files with 398 additions and 337 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ HiveD supports multiple job **priorities**. Higher-priority jobs can **[preempt]

## Feature
1. [Multi-Tenancy: Virtual Cluster (VC)](example/feature/README.md#VC-Safety)
2. [Fine-Grained VC Resource Guarantee](example/feature/README.md#VC-Safety): Quantity, [Topology](example/feature/README.md#VC-Safety), [Type](example/feature/README.md#GPU-Type), [Reservation](example/feature/README.md#Reservation), etc.
3. Flexible Intra-VC Scheduling: [Topology-Awareness](example/feature/README.md#Topology-Aware-Intra-VC-Scheduling), [Flexible GPU Types](example/feature/README.md#GPU-Type), [Reservation](example/feature/README.md#Reservation), Scheduling Policy Customization, etc.
2. [Fine-Grained VC Resource Guarantee](example/feature/README.md#VC-Safety): Quantity, [Topology](example/feature/README.md#VC-Safety), [Type](example/feature/README.md#GPU-Type), [Pinned VC Resource](example/feature/README.md#Pinned-Cells), etc.
3. Flexible Intra-VC Scheduling: [Topology-Awareness](example/feature/README.md#Topology-Aware-Intra-VC-Scheduling), [Flexible GPU Types](example/feature/README.md#GPU-Type), [Pinned VC Resource](example/feature/README.md#Pinned-Cells), Scheduling Policy Customization, etc.
4. Optimized Resource Fragmentation and Less Starvation
5. [Priorities](example/feature/README.md#Guaranteed-Job), [Overuse with Low Priority](example/feature/README.md#Opportunistic-Job), and [Inter-](example/feature/README.md#Inter-VC-Preemption)/[Intra-VC Preemption](example/feature/README.md#Intra-VC-Preemption)
6. [Job (Full/Partial) Gang Scheduling/Preemption](example/feature/README.md#Gang-Scheduling)
Expand All @@ -57,7 +57,7 @@ HiveD supports multiple job **priorities**. Higher-priority jobs can **[preempt]
* [DockerHub](https://hub.docker.com/u/hivedscheduler)

## Related Project
* [FrameworkController](https://github.com/microsoft/frameworkcontroller): A General-Purpose Kubernetes Pod Controller, which can easily leverage HiveD to schedule jobs .
* [FrameworkController](https://github.com/microsoft/frameworkcontroller): A General-Purpose Kubernetes Pod Controller, which can easily leverage HiveD to schedule jobs.
* [OpenPAI](https://github.com/microsoft/pai): A complete solution for AI platform. HiveD will be more user-friendly when working in tandem with OpenPAI.

## Contributing
Expand Down
6 changes: 3 additions & 3 deletions example/config/basic/hivedscheduler.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ physicalCluster:
cellChildren:
- cellAddress: 10.151.41.20
- cellAddress: 10.151.41.21
reservationId: VC2-K80
pinnedCellId: VC2-K80
- cellAddress: 10.151.41.22

virtualClusters:
Expand All @@ -35,5 +35,5 @@ virtualClusters:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
reservedCells:
- reservationId: VC2-K80
pinnedCells:
- pinnedCellId: VC2-K80
18 changes: 9 additions & 9 deletions example/config/design/hivedscheduler.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ kubeApiServerAddress: http://10.10.10.10:8080
# 2. All physicalCells should contain at most one physical specific GPU.
# 3. Each physicalCell should contain exactly one node level cellType.
# 4. Each physicalCell should specify full hierarchies defined by its cellType.
# 5. A reservationId should can be universally locate one physicalCell.
# 5. A pinnedCellId should can be universally locate one physicalCell.
#
# Best Practice:
# 1. Best effort to merge to large cell and reflect physical facts, unless need
Expand Down Expand Up @@ -155,7 +155,7 @@ physicalCluster:
cellAddress: 1.0.0.2 # NODE Name
cellChildren:
- cellAddress: 8 # GPU Index
reservationId: VC1-YQW-CT1
pinnedCellId: VC1-YQW-CT1
- cellAddress: 9 # GPU Index
# One cell has non-standard gpu indices
- cellType: 3-DGX1-P100-NODE
Expand Down Expand Up @@ -221,7 +221,7 @@ physicalCluster:
- cellAddress: 0.0.2.2
- cellType: 4-DGX2-V100-NODE
cellChildren:
- reservationId: VC1-YQW-DGX2
- pinnedCellId: VC1-YQW-DGX2
cellChildren:
- cellAddress: 0.0.3.0
- cellAddress: 0.0.3.1
Expand All @@ -237,7 +237,7 @@ physicalCluster:
- cellAddress: 0.0.4.2
- cellAddress: 0.0.4.3
- cellType: 2-IB-DGX2-V100-NODE
reservationId: VC1-YQW-IB-DGX2
pinnedCellId: VC1-YQW-IB-DGX2
cellChildren:
- cellAddress: 0.1.0.0
- cellAddress: 0.1.0.1
Expand All @@ -250,7 +250,7 @@ physicalCluster:
# 1. The whole VCs must be able to be fitted into the PC.
# 2. The cellType field should be full qualified and should be started with a
# cellType which is referred in physicalCells.
# 3. A reservationId should can only be referred in one VC for one time.
# 3. A pinnedCellId should can only be referred in one VC for one time.
#
# Best Practice:
# 1. Best effort to just plan its own VC quota based on the PC facts, unless need
Expand All @@ -271,13 +271,13 @@ virtualClusters:
# 2 DGX2-V100-NODE must be within the same rack.
- cellType: 4-DGX2-V100-NODE.2-DGX2-V100-NODE
cellNumber: 1
reservedCells:
pinnedCells:
# 1 CT1 GPU must be the VC1-YQW-CT1 GPU.
- reservationId: VC1-YQW-CT1
- pinnedCellId: VC1-YQW-CT1
# 2 DGX2-V100-NODE must be within the VC1-YQW-DGX2 rack.
- reservationId: VC1-YQW-DGX2
- pinnedCellId: VC1-YQW-DGX2
# 2 IB-DGX2-V100-NODE must be within the VC1-YQW-IB-DGX2 rack.
- reservationId: VC1-YQW-IB-DGX2
- pinnedCellId: VC1-YQW-IB-DGX2
VC2:
virtualCells:
# 2 DGX1-P100-NODE may not be within the same rack.
Expand Down
14 changes: 7 additions & 7 deletions example/feature/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,26 @@

HiveD guarantees **quota safety for all VCs**, in the sense that the requests to cells defined in each VC can always be satisfied.

VC's cells can be described by Hardware Quantity, [Topology](#VC-Safety), [Type](#GPU-Type), [Reservation](#Reservation), etc. To guarantee safety, HiveD never allows a VC to "invade" other VCs' cells. For example, to guarantee all VCs' topology, one VC's [guaranteed jobs](#Guaranteed-Job) should never make fragmentation inside other VCs:
VC's cells can be described by Hardware Quantity, [Topology](#VC-Safety), [Type](#GPU-Type), [Pinned Cells](#Pinned-Cells), etc. To guarantee safety, HiveD never allows a VC to "invade" other VCs' cells. For example, to guarantee all VCs' topology, one VC's [guaranteed jobs](#Guaranteed-Job) should never make fragmentation inside other VCs:

Two DGX-2s, two VCs each owns one DGX-2 node. For normal scheduler, this will translate into two VCs each owning 16 GPUs. When user submits 16 1-GPU jobs to VC1, the user in VC2 might not be able to run a 16-GPU job, due to possible fragmentation issue caused by VC1. While HiveD can guarantee each VC always has one entire node reserved for its dedicated use.
Two DGX-2s, two VCs each owns one DGX-2 node. For a traditional scheduler, this will translate into two VCs each owning 16 GPUs. When a user submits 16 1-GPU jobs to VC1, the user in VC2 might not be able to run a 16-GPU job, due to possible fragmentation issue caused by VC1. While HiveD can guarantee each VC always has one entire node available for its dedicated use.

### Reproduce Steps
1. Use [hived-config-1](file/hived-config-1.yaml).
2. Submit 2 jobs [itc-safety-1](file/itc-safety-1.yaml), [itc-safety-2](file/itc-safety-2.yaml) to the same VC, all tasks will always run within the same node (10.151.41.26).
<img src="file/itc-safety-1.png" width="900"/>
<img src="file/itc-safety-2.png" width="900"/>

## Reservation
## Pinned Cells
### Description
One VC contains two DGX-2 nodes. The VC admin would like to reserve one DGX-2 for dedicated use, i.e. without explicit `reservationId` specified, job will not run on the reserved DGX-2.
One VC contains two DGX-2 node cells. The VC admin would like to pin one DGX-2 node cell in the physical cluster for dedicated use, i.e. that cell will be bound to a node statically. Without explicit `pinnedCellId` specified, a job will not be allowed to run on the pinned node.

This is similar to [K8S Taints and Tolerations](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#example-use-cases), but with [VC Safety](#VC-Safety) guaranteed.

### Reproduce Steps
1. Use [hived-config-1](file/hived-config-1.yaml).
2. Submit job [itc-reserve](file/itc-reserve.yaml) to VC2, all tasks in task role vc2rsv will be on node 10.151.41.25 (it is reserved), all tasks in task role vc2norsv will NOT be on node 10.151.41.25.
<img src="file/itc-reserve.png" width="900"/>
1. Use [hived-config-8](file/hived-config-8.yaml).
2. Submit job [itc-pin](file/itc-pin.yaml) to VC1, all tasks in task role vc1pinned will be on node 10.151.41.25 (which is pinned), all tasks in task role vc1nopinned will NOT be on node 10.151.41.25.
<img src="file/itc-pin.png" width="900"/>

## GPU Type
### Description
Expand Down
6 changes: 3 additions & 3 deletions example/feature/file/hived-config-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ physicalCluster:
cellChildren:
- cellAddress: 10.151.41.19
- cellAddress: 10.151.41.25
reservationId: VC2-K80
pinnedCellId: VC2-K80
- cellAddress: 10.151.41.26
- cellType: 2-K80-NODE
cellChildren:
Expand All @@ -42,8 +42,8 @@ virtualClusters:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
reservedCells:
- reservationId: VC2-K80
pinnedCells:
- pinnedCellId: VC2-K80
default:
virtualCells:
- cellType: 2-K80-NODE
Expand Down
50 changes: 50 additions & 0 deletions example/feature/file/hived-config-8.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
kubeApiServerAddress: http://10.151.41.16:8080

physicalCluster:
gpuTypes:
K80:
gpu: 1
cpu: 4
memory: 8192Mi
cellTypes:
K80-2GPU:
childCellType: K80
childCellNumber: 2
K80-NODE:
childCellType: K80-2GPU
childCellNumber: 2
isNodeLevel: true
2-K80-NODE:
childCellType: K80-NODE
childCellNumber: 2
3-K80-NODE:
childCellType: K80-NODE
childCellNumber: 3

physicalCells:
- cellType: 3-K80-NODE
cellChildren:
- cellAddress: 10.151.41.19
- cellAddress: 10.151.41.25
pinnedCellId: VC1-K80
- cellAddress: 10.151.41.26
- cellType: 2-K80-NODE
cellChildren:
- cellAddress: 10.151.41.23
- cellAddress: 10.151.41.24

virtualClusters:
VC1:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
pinnedCells:
- pinnedCellId: VC1-K80
VC2:
virtualCells:
- cellType: 3-K80-NODE.K80-NODE
cellNumber: 1
default:
virtualCells:
- cellType: 2-K80-NODE
cellNumber: 1
Binary file added example/feature/file/itc-pin.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
protocolVersion: 2
name: itc-reserve
name: itc-pin
type: job
prerequisites:
- protocolVersion: 2
name: keras_tensorflow_example
type: dockerimage
uri: openpai/pai.example.keras.tensorflow
taskRoles:
vc2norsv:
instances: 5
vc1nopinned:
instances: 4
completion:
minFailedInstances: 1
minSucceededInstances: 4
Expand All @@ -20,8 +20,8 @@ taskRoles:
commands:
- rm /usr/local/cuda/lib64/stubs/libcuda.so.1
- python mnist_cnn.py
vc2rsv:
instances: 5
vc1pinned:
instances: 4
completion:
minFailedInstances: 1
minSucceededInstances: 4
Expand All @@ -35,13 +35,15 @@ taskRoles:
- python mnist_cnn.py

defaults:
virtualCluster: VC2
virtualCluster: VC1

extras:
hivedScheduler:
jobPriorityClass: prod
taskRoles:
vc2norsv:
vc1nopinned:
gpuType: K80
vc2rsv:
reservationId: VC2-K80
affinityGroupName: vc1nopinned
vc1pinned:
pinnedCellId: VC1-K80
affinityGroupName: vc1pinned
Binary file removed example/feature/file/itc-reserve.png
Binary file not shown.
22 changes: 11 additions & 11 deletions example/request/basic/request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ taskRoles:
affinityGroupName: null
---
jobVC: VC2
jobName: demo2norsv
jobName: demo2nopinned
jobPriorityClass: PROD
taskRoles:
a:
Expand All @@ -22,12 +22,12 @@ taskRoles:
affinityGroupName: null
---
jobVC: VC2
jobName: demo2rsv
jobName: demo2pinned
jobPriorityClass: PROD
taskRoles:
a:
taskNumber: 5
reservationId: VC2-K80
pinnedCellId: VC2-K80
gpuNumber: 1
affinityGroupName: null

Expand Down Expand Up @@ -96,7 +96,7 @@ spec:
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
name: demo2norsv
name: demo2nopinned
spec:
executionType: Start
retryPolicy:
Expand Down Expand Up @@ -143,7 +143,7 @@ spec:
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
name: demo2rsv
name: demo2pinned
spec:
executionType: Start
retryPolicy:
Expand All @@ -165,7 +165,7 @@ spec:
hivedscheduler.microsoft.com/pod-scheduling-spec: |-
virtualCluster: VC2
priority: 1000
reservationId: VC2-K80
pinnedCellId: VC2-K80
gpuNumber: 1
affinityGroup: null
spec:
Expand Down Expand Up @@ -225,8 +225,8 @@ spec:
apiVersion: v1
kind: Pod
metadata:
# demo2norsv-a-{1-4} are the same
name: demo2norsv-a-0
# demo2nopinned-a-{1-4} are the same
name: demo2nopinned-a-0
annotations:
hivedscheduler.microsoft.com/pod-scheduling-spec: |-
virtualCluster: VC2
Expand Down Expand Up @@ -256,13 +256,13 @@ spec:
apiVersion: v1
kind: Pod
metadata:
# demo2rsv-a-{1-4} are the same
name: demo2rsv-a-0
# demo2pinned-a-{1-4} are the same
name: demo2pinned-a-0
annotations:
hivedscheduler.microsoft.com/pod-scheduling-spec: |-
virtualCluster: VC2
priority: 1000
reservationId: VC2-K80
pinnedCellId: VC2-K80
gpuNumber: 1
affinityGroup: null
spec:
Expand Down
Loading

0 comments on commit cb9c73d

Please sign in to comment.