Lesson 2: resnet34, resnext50

(06-Nov-2017, live)

Notebooks Used

dogscats resnet34: fast.ai DL lesson1.ipynb
dogscats resnext50 architecture: lesson1-rxt50.ipynb
Satellite Imagery (planet dataset): lesson2-image_models

Learning Rate

how quickly will we zone in on the solution
we take the gradient, which is how steep is it at this point, and we multiply it by some number, which is the running rate.
if that number is small, we will get closer, slowly
if we take a number too big, we could be far from our minimum
if our loss is spinning off into infinity, most likely our learning rate is too high
Wouldn't it be nice if there was a way to figure out the best learning rate?

10^-1 = 0.10 = 1e-1
10^-2 = 0.001 = 1e-2
10^-3 = 0.001 = 1e-3

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

Col 0:  Epoch Number
Col 1:  loss on training
Col 2:  loss on validation
Col 3:  accuracy

A Jupyter Widget
[ 0.       0.03597  0.01879  0.99365]                         
[ 1.       0.02605  0.01836  0.99365]                         
[ 2.       0.02189  0.0196   0.99316]

precompute = True

Data Augmentation

tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

another option: transforms_top_down
can also create custom transforms
data augmentation is not exactly creating new data, but it's a different way of looking at it for the convolutional neural network

Unfreeze Layers

Learning Rate Annealing

TTA (Test Time Augmentation)

fastai library

open source
sits on top of PyTorch
PyTorch is fairly new; was not able to use Keras or TensorFlow
PyTorch is not easy to use.
So, created a library on top of PyTorch.
PyTorch didn’t seem suitable for new deep learners
With keras, code from last year’s course, is 2-3x longer, which means more opportunities for mistakes
So, fastai built this library to make it easier to get state-of-the-art results
Using this library made it so much more productive.
were able to add in other papers
not only does fastai let us do things easier than other approach, it has more sophisticated stuff behind the scene
fastai library has been released, open source
behind the scenes, it is creating PyTorch models which can be exported
if you're doing something on mobile, you'll need to use TensorFlow
every year, the libraries that are available and the best change
main thing to get out of this course is to get the concepts
- learning rate
- how to do learning rate annealing
- why differential learning rates are important
- stochastic gradient descent with restart

Pyro - Uber's new release

SGDR (Stochastic Gradient Descent with Restarts)

Confusion Matrix

simple way to look at the results of classification

What was the actual truth? Of the thousand actual cats, how many did we predict as cats?

Review: easy steps to train a world-class image classifier

Enable data augmentation, and precompute=True
Use lr_find() to find highest learning rate where loss is still clearly improving
Train last layer from precomputed activations for 1-2 epochs
Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
Unfreeze all layers
Set earlier layers to 3x-10x lower learning rate than next higher layer
Use lr_find() again
Train full network with cycle_mult=2 until over-fitting

And more...

Use lr_find() to find highest learning rate where loss is still clearly improving
Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
Unfreeze all layers
Set earlier layers to 3x-10x lower learning rate than next higher layer
Train full network with cycle_mult=2 until over-fitting

Dataset 2: Dog Breed Competition

can set sz=64, use small size photo in beginning to get model running, and then increase the size
most ImageNet models are trained on 224x224 or 299x299 sized images. Images in that range will work well with these algorithms.
epoch - number of passes thru the data
cycle - however many epochs you say is in that cycle
if cycle is 1, then cycle and epoch are the same
starting training on a few epochs with small size sz=224 and then pass in a larger size of images and continue training. This is another way to get state-of-the-art results. Increase size to 299. If I overfit with 224 size, then I'm not overfitting with 299. This method is an effective way to avoid overfitting.

Note

the best way to deal with unbalanced data is to make copies of the rare cases

precompute=True

started with a pre-trained network; found activations with rich features; then we add a couple of layers at the end, which start off random
with freeze (frozen by default) and precompute=True, all we are learning is the couple of layers we've added
with precompute=True, we actually precalculate how much does this image have the features such as eyeballs, face, etc.
data augmentation doesn't do anything with precompute=True because we're actually showing the same exact activations every time.
we can then set precompute=False, which means it is still only training the last couple of layers, but data augmentation is now working because it is going through and re-calculating all the activations from scratch
finally, when we unfreeze, we can go back and change the earlier convolutional filters
having precompute=True initially makes it faster, 10x faster. It doesn't impact the accuracy. It's just a shortcut.
if you're showing the algorithm less images each time, then it is calculating the gradient with less images, and is less accurate
if making batch size smaller, making algorithm more volatile; impacts the optimal learning rate.
if you're changing the batch size by much, can reduce the learning rate by a bit.

Architectures

different ways of putting together what size convolutional filters, how they're connected to each other
different architectures have different numbers of layers, size of kernels, number of filters

Architecture Types

resnet34 - great starting point, and often a good finishing point, doesn't have too many parameters, works well with small datasets
resnext - 2nd place winner in last year's ImageNet competition.
- can put a number at end to put how big it is
- resnext50 - next step after resnet34
  - takes twice as long to run as resnet34
  - takes 2-3x the memory as resnet34
  - Ex: dogs/cats data took 20 minutes to train on resnext50

Notebook to follow: lesson1-rxt50.ipynb

Dataset 3: Satellite Imagery

dataset: planet
lesson2-image_models

AWS fastami Image

Instructions are here: aws_ami_gpu_setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lesson_2_resnet34_resnext50.md

lesson_2_resnet34_resnext50.md

Lesson 2: resnet34, resnext50

Notebooks Used

Learning Rate

precompute = True

Data Augmentation

Unfreeze Layers

Learning Rate Annealing

TTA (Test Time Augmentation)

fastai library

SGDR (Stochastic Gradient Descent with Restarts)

Confusion Matrix

Review: easy steps to train a world-class image classifier

Dataset 2: Dog Breed Competition

Note

precompute=True

Architectures

Architecture Types

Dataset 3: Satellite Imagery

AWS fastami Image

Files

lesson_2_resnet34_resnext50.md

Latest commit

History

lesson_2_resnet34_resnext50.md

File metadata and controls

Lesson 2: resnet34, resnext50

Notebooks Used

Learning Rate

precompute = True

Data Augmentation

Unfreeze Layers

Learning Rate Annealing

TTA (Test Time Augmentation)

fastai library

SGDR (Stochastic Gradient Descent with Restarts)

Confusion Matrix

Review: easy steps to train a world-class image classifier

Dataset 2: Dog Breed Competition

Note

precompute=True

Architectures

Architecture Types

Dataset 3: Satellite Imagery

AWS fastami Image