![[CleanShot 2024-06-14 at
[email protected]|350]]
- don't normalize training & test set differently
- you need to use the same $\mu$ & $\sigma$
- we need to normalize inputs, cause otherwise the cost function looks like an "elongated bowl"
- your features end up on vastly different scales
- i.e $w1 \in [0,1]$ & $w2 \in [-1000,1000]$
- you are forced to use a small learning rate
- normalizing makes cost function easier to optimize
- pretty much never causes any harm anyways