Design TinyUrl · commitway

Topic: Design TinyUrl

Interviewer: video from 5-5-2022

Interviewee: system

Level: L4 (Experienced Individual Contributor)

Additional Resources:

Live Stream

System Design Interview

Join Us on Wechat

Interview Notes

Start: 7:00

[16:45]

How to store: SQL, noSQL

Design a key-value store, using short URL as the key

Interviewer: how to lookup from long URL to short URL?

Interviewee: hint?

Interviewer: create 2 way table long->short, short->long

Interviewee: one write server, one read server

Interviewer: cannot scale

Interviewee: 1 write server, multiple read servers

Interviewer:

[22:30]

Interviewer: how do you convert long URL into short URL?

Interviewee: use a hash algorithm

Interviewee: updated estimates of size

Interviewer: let’s assume for long URLs we have 64 bits

Interviewee: [made more estimates]

Interviewee: base64 hashing

Interviewer: there may still be collision

Interviewee: add more bytes in the end

Interviewee: add timestamp to the input value

Interviewer: adding timestamp may make the result too long

Interviewee:

hash_function(time_stamp + long_url)

Then take the first 10 bits

Interviewee: we check collision after generating a new URL

[30:00]

Interviewee: we use a sliding window till the short URL value is unique

[ Took too much time for the hash algorithm ]

[ Collision check can lead to race conditions. May want to consider alternative way to generate unique short URLs ]

Interviewee:

Persist data in database

Add cache

Q: is this a browser cache? Is it on client side?

A: yes

[cache probably should be behind LB. Not a browser cache ]

[36:00]

Define API

[ideally define the API sooner]

100M / 10^5 = 1000 QPS

10*8 / 10 ^ 5 = 1000 QPS

SQL 100 QPS

Relation: no need for relational queries

Transaction guarantee? collision detection/prevention using database

No: nosql

Yes: sql

NoSQL:

Table, column

shortURL, longURL

Key

Foreign

Audience:

NoSQL: need 2 tables. Usually no global index. More extensible

MongoDB: can have a secondary index. We can just use 1 table. No need for 2 tables transaction.

===

Option 1

[ User, shortURL ], longURL

Option 2

User, [shortURL], longURL

===

[39:00]

Interviewer: how do you handle large volume of reads? Read = 100x write traffic

Interviewee: have more read replicas. Eventual consistency

[43:34]

Interviewer: how to make the system more robust?

Interviewee: extra layer of cache to check uniqueness of short URL

Interviewer: Cache has limited storage. It cannot save all data

Interviewer: how do you avoid hackers from using lots of reads of fake short URLs?

Interviewee: add rate limiter

[bloom filter?] -> filter out some invalid URLs quickly

Interviewee: sliding window or token buckets for rate limiter

[49:31]

Interviewer: anything to add?

Interviewee: no

===

100M QPS

High qps: cache - is it right solution

===

100M * 5 * 365

2 billion URLs * 100 bytes per URL

200 billion bytes

200 gigabytes

50 machines

Speed:

30,000 QPS

Redis 100,000 - 200,000 QPS

1000 QPS

===

How many read-only server

===

Form:

https://docs.google.com/forms/d/e/1FAIpQLSdU_Fa_leatkBJuKAV25mw4PXHx0BUxDC_zC67qMbzXPUFmKQ/viewform?usp=sf_link

===

Audience:

NoSQL database require more maintenance

Managed DB (dynamoDB)

Cassandra (self managed)

There will be a team to maintain the database. It’s probably not a big concern

There will be higher cost to store a lot of data in nosql database. Need to consider cost: compute, storage, networking

Small amount of data: cheaper to use cloud

Large amount of data: self-managed is cheaper

Materials — open to everyone, no sign-in