- slowly reduce learning rate over time - can potentially speed up learning algorithm - initially you want bigger steps, then smaller steps at end near optimum point - $\alpha = \frac{1}{1 + decayRate * epochNumber} * \alpha_0$ - $decayRate$ becomes another hyperparameter to tune - as $epochNumber$ increases (more full passes of training set), your learning rate decreases - other ways to implement learning rate decay: ![[CleanShot 2024-06-17 at [email protected]|350]]