- slowly reduce learning rate over time
- can potentially speed up learning algorithm
- initially you want bigger steps, then smaller steps at end near optimum point
- $\alpha = \frac{1}{1 + decayRate * epochNumber} * \alpha_0$
- $decayRate$ becomes another hyperparameter to tune
- as $epochNumber$ increases (more full passes of training set), your learning rate decreases
- other ways to implement learning rate decay:
![[CleanShot 2024-06-17 at
[email protected]|350]]