> A collection of [[💽 COMPUTER ENGINEERING|Computers]] aka "nodes" that collaborate to provide a service, these nodes communicate by passing messages over a [[Networking|Network]]
#### Components
----
- [[Microservices]]
- [[Distributed System Technologies]]
- [[Distributed System Models]]
- [[Containers]]
- [[API]]
- [[Data Pipeline]]
- [[Failure Detectors]]
#### Techniques
----
- [[Networking]]
- [[Consistent Hashing]]
- [[Throttling]]
#### Concepts
----
- [[CAP Theorem]]
- [[System Design]]
- [[Compute Scaling]]
- [[Cloud Computing]]
- [[Concurrency]]
#### General
---
- Differences from centralized systems
- computers are physically separate, connected only by [[Networking|Network]], they don't share memory or common clock
- nodes can fail independently but not break service
- messages can be lost or delayed over the [[Networking|Network]]
![[Pasted image 20250228133501.png|400]]
![[Pasted image 20250228133521.png|400]]
### Challenges
---
COMMUNICATION/[[Networking]]:
- challenge:
- [[Internet Protocol (IP)|Internet Protocol (IP)]] is responsible for delivering packets between nodes, but only provides a "best effort" service
- this means packets aren't always reliably delivered
- packet transmission issues include:
- packet loss
- lost packets from congestion, hardware failures, etc.
- packet duplication
- multiple copies of same packet sent to destination node
- packet corruption
- invalid or incorrect data in packet
- out of order delivery
- packets arrive in different order from when sent
- solutions:
1. [[Transmission Control Protocol (TCP)]]
- provides robust mechanism for the reliable, in-order delivery of a byte stream between processes
2. [[Transport Layer Security (TLS)]]
- adds security to data transmission
3. [[Domain Name System (DNS)]]
- in distributed system, nodes need a way to find and communicate with each other
- allows nodes to find + connect with each other using easily memorable names, rather than complex [[Internet Protocol (IP)|IP]] addresses
![[Pasted image 20250228140011.png|300]]
COORDINATION:
- challenge:
- nodes can fail at any time, which disrupts flow of information and can lead to inconsistencies
- coordinating internode info exchange can be hard, due to [[Networking]] constraints mentioned above
- each node might have a different clock, so difficult to have consistent notion of time across all nodes
- solutions:
1. [[Failure Detectors]]
- helps detect if a process failed
2. [[Event Ordering]]
3. [[Leader Election]]
4. [[Database Replication]]
https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321
https://blog.bytebytego.com/p/a-crash-course-on-distributed-systems
https://roadmap.sh/software-architect
https://aws.amazon.com/what-is/distributed-computing/#:~:text=Computers%20in%20a%20distributed%20system,tolerance%20without%20compromising%20data%20consistency