- $- \sum^C_i y_i log(\hat{y}_i)$ - $C$ is all possible classes - $y_i$ is actual fraction of $ith$ class in the labelled/actual fraction from distribution - $\hat{y}_i$ is the model's predicted fraction of the $ith$ class - you sum for all possible classes