Design TinyUrl
Materials — open to everyone, no sign-in
Topic: Design TinyUrl
Interviewer: video from 5-5-2022
Interviewee: system
Level: L4 (Experienced Individual Contributor)
Additional Resources:
System Design Interview
Join Us on Wechat
Interview Notes
Start: 7:00
[16:45]
How to store: SQL, noSQL
Design a key-value store, using short URL as the key
Interviewer: how to lookup from long URL to short URL?
Interviewee: hint?
Interviewer: create 2 way table long->short, short->long
Interviewee: one write server, one read server
Interviewer: cannot scale
Interviewee: 1 write server, multiple read servers
Interviewer:
[22:30]
Interviewer: how do you convert long URL into short URL?
Interviewee: use a hash algorithm
Interviewee: updated estimates of size
Interviewer: let’s assume for long URLs we have 64 bits
Interviewee: [made more estimates]
Interviewee: base64 hashing
Interviewer: there may still be collision
Interviewee: add more bytes in the end
Interviewee: add timestamp to the input value
Interviewer: adding timestamp may make the result too long
Interviewee:
hash_function(time_stamp + long_url)
Then take the first 10 bits
Interviewee: we check collision after generating a new URL
[30:00]
Interviewee: we use a sliding window till the short URL value is unique
[ Took too much time for the hash algorithm ]
[ Collision check can lead to race conditions. May want to consider alternative way to generate unique short URLs ]
Interviewee:
Persist data in database
Add cache
Q: is this a browser cache? Is it on client side?
A: yes
[cache probably should be behind LB. Not a browser cache ]
[36:00]
Define API
[ideally define the API sooner]
100M / 10^5 = 1000 QPS
10*8 / 10 ^ 5 = 1000 QPS
SQL 100 QPS
Relation: no need for relational queries
Transaction guarantee? collision detection/prevention using database
No: nosql
Yes: sql
NoSQL:
Table, column
shortURL, longURL
Key
Foreign
Audience:
NoSQL: need 2 tables. Usually no global index. More extensible
MongoDB: can have a secondary index. We can just use 1 table. No need for 2 tables transaction.
===
Option 1
[ User, shortURL ], longURL
Option 2
User, [shortURL], longURL
===
[39:00]
Interviewer: how do you handle large volume of reads? Read = 100x write traffic
Interviewee: have more read replicas. Eventual consistency
[43:34]
Interviewer: how to make the system more robust?
Interviewee: extra layer of cache to check uniqueness of short URL
Interviewer: Cache has limited storage. It cannot save all data
Interviewer: how do you avoid hackers from using lots of reads of fake short URLs?
Interviewee: add rate limiter
[bloom filter?] -> filter out some invalid URLs quickly
Interviewee: sliding window or token buckets for rate limiter
[49:31]
Interviewer: anything to add?
Interviewee: no
===
100M QPS
High qps: cache - is it right solution
===
100M * 5 * 365
2 billion URLs * 100 bytes per URL
200 billion bytes
200 gigabytes
50 machines
Speed:
30,000 QPS
Redis 100,000 - 200,000 QPS
1000 QPS
===
How many read-only server
===
Form:
===
Audience:
NoSQL database require more maintenance
Managed DB (dynamoDB)
Cassandra (self managed)
There will be a team to maintain the database. It’s probably not a big concern
There will be higher cost to store a lot of data in nosql database. Need to consider cost: compute, storage, networking
Small amount of data: cheaper to use cloud
Large amount of data: self-managed is cheaper