Fault Tolerant API Calling

- timeouts: - max amount of time client waits for request to complete - don’t want to hold those resources for too long - it is difficult to choose a timeout value - too high = reduced usefulness - too low = too many retries - retries: - oftentimes we have - partial failures: % of requests succeed - transient failures: request fails for short period of time - so just retry - backoff: - but for retrying, you will increase load on server - can worsen problem if system is failing due to overload - so we increase time between subsequent retries - jitter: - to prevent synchronized bursts of overload from many clients’ backoffs, we like to randomize amount of time before retrying https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/