- root mean squared prop
- adapts the **learning rate** for each parameter based on historical gradients
- goal is to improve convergence speed & stability
- adjusts step size
- keep [[Exponentially Weighted Averages]] for **squared** parameters
- damps oscillations
![[CleanShot 2024-06-17 at
[email protected]|350]]
- note on the bottom square root division, you actually add a very tiny $\epsilon$ right after to prevent dividing by 0