- [[ReLU Activation Function]]
- [[Tanh Activation Function]]
- [[Sigmoid Activation Function]]
- [[Softmax Activation Function]]
## General
- you can have different activation functions in different layers
- we need **nonlinear** activation functions, can't be linear
- if was linear, all the extra layers would be redundant, it's just a linear function
- you lose a ton of expressivity
![[CleanShot 2024-06-10 at
[email protected]|300]]
- only if output $y$ is real number (i.e dollars), using a linear activation at last layer might be acceptable
- $\sigma$ refers to sigmoid, but for general nonlinear activation function we could use $g$