- used for [[ReLU Activation Function]] - helps prevent [[Vanishing and Exploding Gradients]], speeds up training - balances based on previous layer - let $x \sim N(0, 1)$ - set initial $w$ to $x * \sqrt{\frac{2}{n^{[l-1]}}}$ - note $n$ is number of neurons at layer $l$