- when creating neural networks, most points of zero gradients are **saddle points**, not points of local optima
- long time ago people feared local optima, but given large enough neural network, we're actually unlikely to get stuck in bad local optima
- left is local optima, right is saddle
![[CleanShot 2024-06-17 at
[email protected]|350]]
- my thought: you get a $\cup$ or $\cap$ shape when analyzing 2D space of $w_i$, $J$
- saddles cause plateaus (region where gradient is close to 0 for a long time), & are a big problem
- can take a very long time
- [[Adam Optimizer]] & [[RMSprop Optimizer]] can help with this problem, increase rate of getting off plateau