- [[Mini Batch Gradient Descent]] - makes model learn the best settings for parameters $w$ & $b$ - finds the $w$, $b$ that minimizes $J(w,b)$ the [[Cost Function]] - here below the cost function is convex ![[CleanShot 2024-06-07 at [email protected]|300]] - we will initialize $w$ & $b$ at some fixed settings - people don't usually do random - but for convex loss functions like in [[Logistic Regression]] with [[Binary Cross Entropy Loss]], doesn't really matter where you start cause you'll always reach same best optimum ![[CleanShot 2024-06-07 at [email protected]|200]] - constantly takes steps of steepest descent, to try to get to the global optimum or somewhere close - we repeat the below - $w = w - \alpha \frac{\partial J(w,b)}{\partial w}$ - $\alpha$ is the learning rate, how big a step we take each iteration