- derivative when $z$ is exactly 0 is not well defined, but $z$ rarely is exactly 0 - just pretend derivative is 0 or 1 at that point - often the default activation choice - tends to get faster learning than sigmoid or tanh activations, since there is no slope of 0's (technically when $z$ is negative there is no slope, but enough $z$ values will be > 0 to cause fast learning) ![[CleanShot 2024-06-10 at [email protected]|400]]