- when creating neural networks, most points of zero gradients are **saddle points**, not points of local optima - long time ago people feared local optima, but given large enough neural network, we're actually unlikely to get stuck in bad local optima - left is local optima, right is saddle ![[CleanShot 2024-06-17 at [email protected]|350]] - my thought: you get a $\cup$ or $\cap$ shape when analyzing 2D space of $w_i$, $J$ - saddles cause plateaus (region where gradient is close to 0 for a long time), & are a big problem - can take a very long time - [[Adam Optimizer]] & [[RMSprop Optimizer]] can help with this problem, increase rate of getting off plateau