- [[Mini Batch Gradient Descent]]
- makes model learn the best settings for parameters $w$ & $b$
- finds the $w$, $b$ that minimizes $J(w,b)$ the [[Cost Function]]
- here below the cost function is convex
![[CleanShot 2024-06-07 at
[email protected]|300]]
- we will initialize $w$ & $b$ at some fixed settings
- people don't usually do random
- but for convex loss functions like in [[Logistic Regression]] with [[Binary Cross Entropy Loss]], doesn't really matter where you start cause you'll always reach same best optimum
![[CleanShot 2024-06-07 at
[email protected]|200]]
- constantly takes steps of steepest descent, to try to get to the global optimum or somewhere close
- we repeat the below
- $w = w - \alpha \frac{\partial J(w,b)}{\partial w}$
- $\alpha$ is the learning rate, how big a step we take each iteration