## Requirements
---
- functional
- given a long URL, create a short URL
- given a short URL, redirect to a long URL
- non functional
- very low latency
- very high availability
### Clarifying Questions
---
- how long is the shortened URL?
- as short as possible
- what is the traffic volume?
- 100 million URLs are generated per day
- what characters are allowed in the shortened URL?
- can be 0-9, a-z, A-Z
## API Design
---
- POST endpoint api/v1/data/create-url
- Params: long-url
- Status code: 201 created
- GET endpoint api/v1/{short-url}
- Status code: 301 Permanent Redirect
### [[Back of the Envelope Estimation]]
---
- write operation
- 100 million URLs per day
- write operation per second
- 100 million / 24 / 3600 = 1160 write ops p/s
- read operation
- assume 50:1 read:write ratio
- 58,000 read ops p/s
- record support
- assume we support for 10 years
- 100 million * 365 * 10 = 365 billion records
- storage requirement
- assume average URL length is 100
- note: each character is a byte
- 365 billion records * 100 bytes = 36.5 TB
## URL Redirecting
---
- we use 301 [[HTTP Status Code|Status Code]] to for reduced latency, or 302 for analytics
![[Pasted image 20250302113700.png|300]]
## Implementation
---
- we use [[Hashing|Hash Function]] to hash the long URL into a hash value we use at the end of our shortened URL
![[Pasted image 20250302113714.png|300]]
- we store results in [[SQL Databases|Relational Database]], like below
![[Pasted image 20250302120337.png|300]]
## Shorten Via Hash
---
- recall we have 365 billion records
- hash values have characters [0-9, a-z, A-Z]
- 10 + 26 + 26 = 62 characters
- what is minimum length of hash to account for 365 billion records?
![[Pasted image 20250302120528.png|400]]
- 7 would be more than enough, 6 is not enough
- so we can use the [[Hashing|Hash Functions]] below:
![[Pasted image 20250302141529.png|300]]
- if we use the above [[Hashing|Hash Functions]], the hash values are too long (more than 7 characters)
- solution is to just take the 1st 7 characters of a hash value
- but this can cause collisions
- so we keep appending a new predefined string to the long URL until there is no collision with existing hash values in the database
- CON: this might make requests expensive, as it is costly to query the database each time to check if a short URL / hash value already exists in the DB
![[Pasted image 20250302141858.png|400]]
## Shorten Via Base 62 Conversion
---
- convert base 10 (decimal) number to base 62
- we can get decimal number from [[Unique ID Generator]] for database ID
![[Pasted image 20250302143433.png|400]]
- The mappings are: 0-0, ..., 9-9, 10-a, 11-b, ..., 35-z, 36-A, ..., 61-Z, where ‘a’ stands for 10, ‘Z’ stands for 61, etc.
- 1,115,710 = 2TX in base 62
![[Pasted image 20250302143621.png|400]]
## Summary
---
![[Pasted image 20250302140005.png]]
## If Extra Time
---
- [[Rate Limiter]]
- maybe hacker wants to send an insane army of URL shortening requests
- Analytics
- maybe we want to know how many people click on a shortened link etc
- would require us to do 302 instead of 301
https://bytebytego.com/
https://www.youtube.com/watch?v=Cg3XIqs_-4c&ab_channel=TechPrep
https://medium.com/@sandeep4.verma/system-design-scalable-url-shortener-service-like-tinyurl-106f30f23a82