Skip to content

Latest commit





The State of Sparsity in Deep Neural Networks

This directory contains the code accompanying the paper "The State of Sparsity in Deep Neural Networks". All authors contributed to this code.

The layers subdirectory contains implementations of variational dropout and l0 regularization in TensorFlow. The sparse_transformer and sparse_rn50 subdirectories contain code for the Transformer and ResNet-50 experiments from the aforementioned paper. The results subdirectory contains CSV files of the results of all hyperparameter configurations that we explored for each model, sparsity technique, and sparsity level.

Build Docker Image

To build a Docker image with all required dependencies, run sudo docker build -t <image_name> .. The base setup installs TensorFlow with GPU support and is based off Nvidia's CUDA-9.0 image with all the required libraries to run TensorFlow. To launch the container, run sudo docker run --runtime=nvidia -v ~/:/mount/ -it <image_name>:latest. This command additionaly makes your home directory accessbile at /mount inside the container.

To run with GPU support, swap tensorflow for tensorflow-gpu in requirements.txt.

Sparse Transformer

Once inside the container, this repo contains all of the code and data needed to decode the WMT English-German 2014 test set and calculate the BLEU score for each of the checkpoints we provided.

Small scripts to decode from Transformer checkpoints trained with each technique are provided in sparse_transformer/decode/. For random pruning checkpoints, use the script. For variational dropout, you'll need to pass in the same log alpha threshold that was used to achieve the BLEU score in checkpoint directory, which is provided as the last number in the checkpoint directory name.

The results of decoding from the model checkpoint will be saved in the sparse_transformer/decode/ directory with a name like newstest2014.end.sparse_transformer.... To calculate the BLEU score for these decodes, run sh <decode_output>. This script relies on the mosesdecoder project (, and assumes this is installed at /mount/mosesdecoder inside the container. The output of the script should match the BLEU score reported in the checkpoint directory.

Sparse ResNet-50

Scripts to evaluate ResNet-50 checkpoints on the ImageNet test set are provided in sparse_rn50/evaluate/. For random pruning checkpoints, use the script. You'll similarly need to pass in the log alpha threshold to evaluate va¯riaitonal dropout checkpoints, which was 0.5 for all our models. This repository does not include the ImageNet dataset, so you'll also need to point these scripts at a local version of the ImageNet test set stored as TFRecords. The output of the script should match the top-1 accuracy reported in the checkpoint directory.

Calculate Weight Sparsity

To calculate the weight sparsity for a checkpoint, use the script and pass the checkpoint file, sparsity technique, and model ("transformer" or "rn50"). For variational dropout, also pass the same log alpha threshold.

Trained Checkpoints

The top performing checkpoints for each model and sparsity technique can be downloaded with the following links.

Model Technique Sparsity BLEU Link
Transformer Magnitude Pruning 50% 26.33 link
Transformer Magnitude Pruning 60% 25.94 link
Transformer Magnitude Pruning 70% 25.21 link
Transformer Magnitude Pruning 80% 24.65 link
Transformer Magnitude Pruning 90% 23.26 link
Transformer Magnitude Pruning 95% 20.75 link
Transformer Magnitude Pruning 98% 16.37 link
Transformer Variational Dropout 50% 26.26 link
Transformer Variational Dropout 60% 25.37 link
Transformer Variational Dropout 70% 25.08 link
Transformer Variational Dropout 80% 24.33 link
Transformer Variational Dropout 90% 21.43 link
Transformer Variational Dropout 95% 19.13 link
Transformer Variational Dropout 98% 14.45 link
Transformer L0 Regularization 50% 26.72 link
Transformer L0 Regularization 60% 26.16 link
Transformer L0 Regularization 70% 25.29 link
Transformer L0 Regularization 80% 24.15 link
Transformer L0 Regularization 90% 20.05 link
Transformer L0 Regularization 95% 19.78 link
Transformer L0 Regularization 98% 16.83 link
Transformer Random Pruning 50% 24.56 link
Transformer Random Pruning 60% 24.45 link
Transformer Random Pruning 70% 24.01 link
Transformer Random Pruning 80% 23.15 link
Transformer Random Pruning 90% 20.67 link
Transformer Random Pruning 95% 17.42 link
Transformer Random Pruning 98% 10.94 link
Model Technique Sparsity Top-1 Accuracy Link
ResNet-50 Magnitude Pruning 50% 76.53 link
ResNet-50 Magnitude Pruning 70% 76.38 link
ResNet-50 Magnitude Pruning 80% 75.58 link
ResNet-50 Magnitude Pruning 90% 73.91 link
ResNet-50 Magnitude Pruning 95% 70.59 link
ResNet-50 Magnitude Pruning 98% 57.9 link
ResNet-50 Magnitude Pruning (extended/non-uniform) 80% 76.52 link
ResNet-50 Magnitude Pruning (extended/non-uniform) 90% 75.16 link
ResNet-50 Magnitude Pruning (extended/non-uniform) 95% 72.71 link
ResNet-50 Magnitude Pruning (extended/non-uniform) 96.5% 69.26 link
ResNet-50 Random Pruning 50% 74.59 link
ResNet-50 Random Pruning 70% 72.2 link
ResNet-50 Random Pruning 80% 70.21 link
ResNet-50 Random Pruning 90% 65 link
ResNet-50 Random Pruning 95% 58.04 link
ResNet-50 Random Pruning 98% 43.99 link
ResNet-50 Variational Dropout 50% 76.55 link
ResNet-50 Variational Dropout 80% 75.28 link
ResNet-50 Variational Dropout 90% 73.84 link
ResNet-50 Variational Dropout 95% 71.91 link
ResNet-50 Variational Dropout 98% 67.36 link