- root mean squared prop - adapts the **learning rate** for each parameter based on historical gradients - goal is to improve convergence speed & stability - adjusts step size - keep [[Exponentially Weighted Averages]] for **squared** parameters - damps oscillations ![[CleanShot 2024-06-17 at [email protected]|350]] - note on the bottom square root division, you actually add a very tiny $\epsilon$ right after to prevent dividing by 0