- timeouts:
- max amount of time client waits for request to complete
- don’t want to hold those resources for too long
- it is difficult to choose a timeout value
- too high = reduced usefulness
- too low = too many retries
- retries:
- oftentimes we have
- partial failures: % of requests succeed
- transient failures: request fails for short period of time
- so just retry
- backoff:
- but for retrying, you will increase load on server
- can worsen problem if system is failing due to overload
- so we increase time between subsequent retries
- jitter:
- to prevent synchronized bursts of overload from many clients’ backoffs, we like to randomize amount of time before retrying
https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/