- $- \sum^C_i y_i log(\hat{y}_i)$
- $C$ is all possible classes
- $y_i$ is actual fraction of $ith$ class in the labelled/actual fraction from distribution
- $\hat{y}_i$ is the model's predicted fraction of the $ith$ class
- you sum for all possible classes