Xavier Initialization - Brendan Shih

- used for [[Tanh Activation Function]] - helps prevent [[Vanishing and Exploding Gradients]], speeds up training - balances based on previous layer - let $x \sim N(0, 1)$ - set initial $w$ to $x * \sqrt{\frac{1}{n^{[l-1]}}}$