## Requirements --- - functional - given a long URL, create a short URL - given a short URL, redirect to a long URL - non functional - very low latency - very high availability ### Clarifying Questions --- - how long is the shortened URL? - as short as possible - what is the traffic volume? - 100 million URLs are generated per day - what characters are allowed in the shortened URL? - can be 0-9, a-z, A-Z ## API Design --- - POST endpoint api/v1/data/create-url - Params: long-url - Status code: 201 created - GET endpoint api/v1/{short-url} - Status code: 301 Permanent Redirect ### [[Back of the Envelope Estimation]] --- - write operation - 100 million URLs per day - write operation per second - 100 million / 24 / 3600 = 1160 write ops p/s - read operation - assume 50:1 read:write ratio - 58,000 read ops p/s - record support - assume we support for 10 years - 100 million * 365 * 10 = 365 billion records - storage requirement - assume average URL length is 100 - note: each character is a byte - 365 billion records * 100 bytes = 36.5 TB ## URL Redirecting --- - we use 301 [[HTTP Status Code|Status Code]] to for reduced latency, or 302 for analytics ![[Pasted image 20250302113700.png|300]] ## Implementation --- - we use [[Hashing|Hash Function]] to hash the long URL into a hash value we use at the end of our shortened URL ![[Pasted image 20250302113714.png|300]] - we store results in [[SQL Databases|Relational Database]], like below ![[Pasted image 20250302120337.png|300]] ## Shorten Via Hash --- - recall we have 365 billion records - hash values have characters [0-9, a-z, A-Z] - 10 + 26 + 26 = 62 characters - what is minimum length of hash to account for 365 billion records? ![[Pasted image 20250302120528.png|400]] - 7 would be more than enough, 6 is not enough - so we can use the [[Hashing|Hash Functions]] below: ![[Pasted image 20250302141529.png|300]] - if we use the above [[Hashing|Hash Functions]], the hash values are too long (more than 7 characters) - solution is to just take the 1st 7 characters of a hash value - but this can cause collisions - so we keep appending a new predefined string to the long URL until there is no collision with existing hash values in the database - CON: this might make requests expensive, as it is costly to query the database each time to check if a short URL / hash value already exists in the DB ![[Pasted image 20250302141858.png|400]] ## Shorten Via Base 62 Conversion --- - convert base 10 (decimal) number to base 62 - we can get decimal number from [[Unique ID Generator]] for database ID ![[Pasted image 20250302143433.png|400]] - The mappings are: 0-0, ..., 9-9, 10-a, 11-b, ..., 35-z, 36-A, ..., 61-Z, where ‘a’ stands for 10, ‘Z’ stands for 61, etc. - 1,115,710 = 2TX in base 62 ![[Pasted image 20250302143621.png|400]] ## Summary --- ![[Pasted image 20250302140005.png]] ## If Extra Time --- - [[Rate Limiter]] - maybe hacker wants to send an insane army of URL shortening requests - Analytics - maybe we want to know how many people click on a shortened link etc - would require us to do 302 instead of 301 https://bytebytego.com/ https://www.youtube.com/watch?v=Cg3XIqs_-4c&ab_channel=TechPrep https://medium.com/@sandeep4.verma/system-design-scalable-url-shortener-service-like-tinyurl-106f30f23a82