Skip to content

Latest commit

 

History

History
 
 

ResNet

Training code of three variants of ResNet on ImageNet:

The training follows the exact recipe used by the Training ImageNet in 1 Hour paper and gets the same performance. Models can be downloaded here.

This recipe has better performance than most open source implementations. In fact, many papers that claim to "improve" ResNet only compete with a lower baseline and they actually cannot beat this ResNet recipe.

Model Top 5 Error Top 1 Error Download
ResNet18 10.50% 29.66% ⬇️
ResNet34 8.56% 26.17% ⬇️
ResNet50 6.85% 23.61% ⬇️
ResNet50-SE 6.24% 22.64% ⬇️
ResNet101 6.04% 21.95% ⬇️
ResNet152 5.78% 21.51% ⬇️

To train, first decompress ImageNet data into this structure, then:

./imagenet-resnet.py --data /path/to/original/ILSVRC -d 50 [--mode resnet/preact/se]
# See ./imagenet-resnet.py -h for other options.

You should be able to see good GPU utilization (95%~99%), if your data is fast enough. With batch=64x8, it can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s).

The default data pipeline is probably OK for machines with SSD & 20 CPU cores. See the tutorial on other options to speed up your data.

imagenet

This script only converts and runs ImageNet-ResNet{50,101,152} Caffe models released by MSRA. Note that the architecture is different from the imagenet-resnet.py script and the models are not compatible. ResNets have evolved, generally you should not cite these numbers as baselines in your paper.

Usage:

# download and convert caffe model to npz format
python -m tensorpack.utils.loadcaffe PATH/TO/{ResNet-101-deploy.prototxt,ResNet-101-model.caffemodel} ResNet101.npz
# run on an image
./load-resnet.py --load ResNet-101.npz --input cat.jpg --depth 101

The converted models are verified on ILSVRC12 validation set. The per-pixel mean used here is slightly different from the original.

Model Top 5 Error Top 1 Error
ResNet 50 7.78% 24.77%
ResNet 101 7.11% 23.54%
ResNet 152 6.71% 23.21%

Reproduce pre-activation ResNet on CIFAR10.

cifar10

Also see a DenseNet implementation of the paper Densely Connected Convolutional Networks.

Reproduce the mixup pre-act ResNet-18 CIFAR10 experiment, in the paper:

This implementation follows exact settings from the author's code. Note that the architecture is different from the offcial preact-ResNet18.

Usage:

./cifar10-preact18-mixup.py  # train without mixup
./cifar10-preact18-mixup.py --mixup   # with mixup

Results of the reference code can be reproduced. In one run it gives me: 5.48% without mixup; 4.17% with mixup (alpha=1).