Skip to content

Commit

Permalink
Update fastai_dl_terms.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Reshama Shaikh committed Jan 25, 2018
1 parent 627877f commit 6926a2c
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions fastai_dl_terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
`bs` = batch size
`.cuda()` we tell it manually to use the (default number of) GPUs
`.cuda(2)` specify number of GPUs to use is 2
`lr` learning rate
`lr_find()` learning rate finder
`md.nt` = number of unique tokens
`n_fac` = size of embedding
Expand All @@ -18,6 +19,7 @@
`ps` = p's (percents for dropouts)
`sz` = size (of photo)
`tfms` = transformations
`.TTA()` Test Time Augmentation
`wds` = weight decays

---
Expand All @@ -27,17 +29,21 @@ cardinality: number of levels of a categorical variable
---
# Deep Learning Terms

### ADAM (Adaptive Moment Estimation)
- Adam is a stochastic gradient descent algorithm based on estimation of 1st and 2nd-order moments. The algorithm estimates 1st-order moment (the gradient mean) and 2nd-order moment (element-wise squared gradient) of the gradient using exponential moving average, and corrects its bias. The final weight update is proportional to learning rate times 1st-order moment divided by the square root of 2nd-order moment.
- Adam takes 3 hyperparameters: the learning rate, the decay rate of 1st-order moment, and the decay rate of 2nd-order moment
- [ADAM: A Method for Stochastic Optimization](https://theberkeleyview.wordpress.com/2015/11/19/berkeleyview-for-adam-a-method-for-stochastic-optimization/)


### SoTA (State-of-the-Art)

### TTA (Test Time Augmentation)

### Epoch
An epoch is a complete pass through a given dataset.

### ADAM (Adaptive Moment Estimation)
- Adam is a stochastic gradient descent algorithm based on estimation of 1st and 2nd-order moments. The algorithm estimates 1st-order moment (the gradient mean) and 2nd-order moment (element-wise squared gradient) of the gradient using exponential moving average, and corrects its bias. The final weight update is proportional to learning rate times 1st-order moment divided by the square root of 2nd-order moment.
- Adam takes 3 hyperparameters: the learning rate, the decay rate of 1st-order moment, and the decay rate of 2nd-order moment
- [ADAM: A Method for Stochastic Optimization](https://theberkeleyview.wordpress.com/2015/11/19/berkeleyview-for-adam-a-method-for-stochastic-optimization/)
### FC (Fully Connected)
fully connected neural network layer

### Learning Rate Annealing
Learning rate schedules try to adjust the learning rate during training by e.g. annealing, i.e. reducing the learning rate according to a pre-defined schedule or when the change in objective between epochs falls below a threshold. These schedules and thresholds, however, have to be defined in advance and are thus unable to adapt to a dataset's characteristics
Expand Down

0 comments on commit 6926a2c

Please sign in to comment.