- activation function for multi class classification, gives values $\in [0, 1]$ for each class, highest is meant as the prediction - given $z_i$ is the $ith$ class of $k$ classes in output vector $z$, $\sigma$ refers to softmax function - $\sigma(z_i) = \frac{e^{z_i}}{\sum_k e^{z_k}}$ - Loss function: [[Cross Entropy Loss]]