From 6926a2c5706ed48986d950433e056e440fe99d38 Mon Sep 17 00:00:00 2001 From: Reshama Shaikh Date: Thu, 25 Jan 2018 09:43:54 -0500 Subject: [PATCH] Update fastai_dl_terms.md --- fastai_dl_terms.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fastai_dl_terms.md b/fastai_dl_terms.md index fb5a338..c4dc10f 100644 --- a/fastai_dl_terms.md +++ b/fastai_dl_terms.md @@ -10,6 +10,7 @@ `bs` = batch size `.cuda()` we tell it manually to use the (default number of) GPUs `.cuda(2)` specify number of GPUs to use is 2 +`lr` learning rate `lr_find()` learning rate finder `md.nt` = number of unique tokens `n_fac` = size of embedding @@ -18,6 +19,7 @@ `ps` = p's (percents for dropouts) `sz` = size (of photo) `tfms` = transformations +`.TTA()` Test Time Augmentation `wds` = weight decays --- @@ -27,6 +29,12 @@ cardinality: number of levels of a categorical variable --- # Deep Learning Terms +### ADAM (Adaptive Moment Estimation) +- Adam is a stochastic gradient descent algorithm based on estimation of 1st and 2nd-order moments. The algorithm estimates 1st-order moment (the gradient mean) and 2nd-order moment (element-wise squared gradient) of the gradient using exponential moving average, and corrects its bias. The final weight update is proportional to learning rate times 1st-order moment divided by the square root of 2nd-order moment. +- Adam takes 3 hyperparameters: the learning rate, the decay rate of 1st-order moment, and the decay rate of 2nd-order moment +- [ADAM: A Method for Stochastic Optimization](https://theberkeleyview.wordpress.com/2015/11/19/berkeleyview-for-adam-a-method-for-stochastic-optimization/) + + ### SoTA (State-of-the-Art) ### TTA (Test Time Augmentation) @@ -34,10 +42,8 @@ cardinality: number of levels of a categorical variable ### Epoch An epoch is a complete pass through a given dataset. -### ADAM (Adaptive Moment Estimation) -- Adam is a stochastic gradient descent algorithm based on estimation of 1st and 2nd-order moments. The algorithm estimates 1st-order moment (the gradient mean) and 2nd-order moment (element-wise squared gradient) of the gradient using exponential moving average, and corrects its bias. The final weight update is proportional to learning rate times 1st-order moment divided by the square root of 2nd-order moment. -- Adam takes 3 hyperparameters: the learning rate, the decay rate of 1st-order moment, and the decay rate of 2nd-order moment -- [ADAM: A Method for Stochastic Optimization](https://theberkeleyview.wordpress.com/2015/11/19/berkeleyview-for-adam-a-method-for-stochastic-optimization/) +### FC (Fully Connected) +fully connected neural network layer ### Learning Rate Annealing Learning rate schedules try to adjust the learning rate during training by e.g. annealing, i.e. reducing the learning rate according to a pre-defined schedule or when the change in objective between epochs falls below a threshold. These schedules and thresholds, however, have to be defined in advance and are thus unable to adapt to a dataset's characteristics