> A collection of [[💽 COMPUTER ENGINEERING|Computers]] aka "nodes" that collaborate to provide a service, these nodes communicate by passing messages over a [[Networking|Network]] #### Components ---- - [[Microservices]] - [[Distributed System Technologies]] - [[Distributed System Models]] - [[Containers]] - [[API]] - [[Data Pipeline]] - [[Failure Detectors]] #### Techniques ---- - [[Networking]] - [[Consistent Hashing]] - [[Throttling]] #### Concepts ---- - [[CAP Theorem]] - [[System Design]] - [[Compute Scaling]] - [[Cloud Computing]] - [[Concurrency]] #### General --- - Differences from centralized systems - computers are physically separate, connected only by [[Networking|Network]], they don't share memory or common clock - nodes can fail independently but not break service - messages can be lost or delayed over the [[Networking|Network]] ![[Pasted image 20250228133501.png|400]] ![[Pasted image 20250228133521.png|400]] ### Challenges --- COMMUNICATION/[[Networking]]: - challenge: - [[Internet Protocol (IP)|Internet Protocol (IP)]] is responsible for delivering packets between nodes, but only provides a "best effort" service - this means packets aren't always reliably delivered - packet transmission issues include: - packet loss - lost packets from congestion, hardware failures, etc. - packet duplication - multiple copies of same packet sent to destination node - packet corruption - invalid or incorrect data in packet - out of order delivery - packets arrive in different order from when sent - solutions: 1. [[Transmission Control Protocol (TCP)]] - provides robust mechanism for the reliable, in-order delivery of a byte stream between processes 2. [[Transport Layer Security (TLS)]] - adds security to data transmission 3. [[Domain Name System (DNS)]] - in distributed system, nodes need a way to find and communicate with each other - allows nodes to find + connect with each other using easily memorable names, rather than complex [[Internet Protocol (IP)|IP]] addresses ![[Pasted image 20250228140011.png|300]] COORDINATION: - challenge: - nodes can fail at any time, which disrupts flow of information and can lead to inconsistencies - coordinating internode info exchange can be hard, due to [[Networking]] constraints mentioned above - each node might have a different clock, so difficult to have consistent notion of time across all nodes - solutions: 1. [[Failure Detectors]] - helps detect if a process failed 2. [[Event Ordering]] 3. [[Leader Election]] 4. [[Database Replication]] https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 https://blog.bytebytego.com/p/a-crash-course-on-distributed-systems https://roadmap.sh/software-architect https://aws.amazon.com/what-is/distributed-computing/#:~:text=Computers%20in%20a%20distributed%20system,tolerance%20without%20compromising%20data%20consistency