- used for [[ReLU Activation Function]]
- helps prevent [[Vanishing and Exploding Gradients]], speeds up training
- balances based on previous layer
- let $x \sim N(0, 1)$
- set initial $w$ to $x * \sqrt{\frac{2}{n^{[l-1]}}}$
- note $n$ is number of neurons at layer $l$