Residual Neural Networks

- with very deep [[Deep Learning|Neural Networks]], you get [[Vanishing and Exploding Gradients]] - in resnets, we use shortcut/skip connections from earlier activation to later activation (add to a deeper Z before getting activated by [[Activation Functions|Activation Function]]) to fix this problem - allows you to learn effectively even with many layers - each blue segment below is a residual block ![[CleanShot 2024-07-04 at [email protected]]] ![[CleanShot 2024-07-04 at [email protected]]] - adding residual components won't hurt performance, because ResNets can easily learn identity functions, specifically, in a residual block, it would learn the weights & biases in between as 0, so $a^{[l+2]} = a^{[l]}$ straight up, can basically ignore entire layers