- $L(\hat{y},y) = -(y\,ln(\hat{y})) + (1 - y)ln(1 - \hat{y}))$ - if actual output $y$ is 1, we want $-ln(\hat{y})$ to be the loss function - want $\hat{y}$ to be large to get to bottom - if actual output $y$ is 0, we want $-ln(1-\hat{y})$ to be loss function - remember $\hat{y}$ ranges from 0 to 1, so at 0 you have the lowest loss - so we want $\hat{y}$ to be as small as possible to get the bottommost loss floor