- activation function for multi class classification, gives values $\in [0, 1]$ for each class, highest is meant as the prediction
- given $z_i$ is the $ith$ class of $k$ classes in output vector $z$, $\sigma$ refers to softmax function
- $\sigma(z_i) = \frac{e^{z_i}}{\sum_k e^{z_k}}$
- Loss function: [[Cross Entropy Loss]]