- components that monitor the status of processes in [[Distributed Systems]] and determine their availability (determines if a process failed or not) - in [[Distributed Systems]], impossible to definitively differentiate between a failed process and a process that is just really slow in responding - [[Networking|Network]] issues can cause a process to appear unresponsive, even when it is still functioning correctly - failure detectors must make a tradeoff between detection time and the rate of false positives - if we detect failures too quickly, it can incorrectly classify slow processes as failed (higher false positive) - if it's too conservative though, then we get much slower detection time of actual failures