Unsupervised Speedup Prediction of GPU paralleization using CFG and Transformers

It is not always worth it to pararellize a application using GPUs, this model attempts to predict the cost/benefit of doing so without having to run expensive tests on large amounts of data.

Context Free Grammar

A CFG is created that can generate CUDA programs with the following features

1D, 2D, 3D problem sets
shared memory utilization
thread syncing
atomic operations
call __device__ functions

The CFG is then used to generate 5000 different programs, each have a corresponding serialized version and paralleized version of the random program. Each of these is then comiled and run with several different inputs (matrix size, block sizes and grid sizes) and the performance is measured as well as the correctness of the outputs by comparing the serialized versions to the parallel ones.

Programs that generate coda that is not equivalent at runtime are discarded.

Data Generation

Code can be generated by calling

python -m nyu.gpu.speedup <num-samples>

This will generate a code snippet, compile and execute it, and record the source code and runtime for each in a csv file called cuda_speedup.csv

The train_model.ipynb can then be used to train the model.

Modeling

Once the dataset is generated a pre trained gpt-neo model trained on source code is used as a function embedding featurizer. This is then fed into a small feed forward neural network.

The following results are reported over a smaller 500 sample dataset with a 25-75 train validation split, no hypertuning is done on the model

training score= 0.99 R^2
test score= 0.88 R^2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset		dataset
nyu/gpu/speedup		nyu/gpu/speedup
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
train_model.ipynb		train_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Speedup Prediction of GPU paralleization using CFG and Transformers

Context Free Grammar

Data Generation

Modeling

About

Releases

Packages

Languages

gcemaj/cuda-speedup-gpt

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Speedup Prediction of GPU paralleization using CFG and Transformers

Context Free Grammar

Data Generation

Modeling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages