- [[ReLU Activation Function]] - [[Tanh Activation Function]] - [[Sigmoid Activation Function]] - [[Softmax Activation Function]] ## General - you can have different activation functions in different layers - we need **nonlinear** activation functions, can't be linear - if was linear, all the extra layers would be redundant, it's just a linear function - you lose a ton of expressivity ![[CleanShot 2024-06-10 at [email protected]|300]] - only if output $y$ is real number (i.e dollars), using a linear activation at last layer might be acceptable - $\sigma$ refers to sigmoid, but for general nonlinear activation function we could use $g$