- $L(\hat{y},y) = -(y\,ln(\hat{y})) + (1 - y)ln(1 - \hat{y}))$
- if actual output $y$ is 1, we want $-ln(\hat{y})$ to be the loss function
- want $\hat{y}$ to be large to get to bottom
- if actual output $y$ is 0, we want $-ln(1-\hat{y})$ to be loss function
- remember $\hat{y}$ ranges from 0 to 1, so at 0 you have the lowest loss
- so we want $\hat{y}$ to be as small as possible to get the bottommost loss floor