TinyURL (used)
Topic: TinyURL (used)
Interviewer: jiayan
Interviewee: 靖楓
Level: L4 (Experienced Individual Contributor)
Topic
Mock System Design Interview Summary
Interview Overview
Date: 5/5/2022
Target level: L4
Duration: 45 minutes
Topic covered: Tiny URL
Drawing tool used: excalidraw
Key problem vote: ?
Requirements
Functional requirements
Long URL=> short URL
Each long URL should only have one short URL
Length as short as possible
Characters: upper and lower case + numbers
Non functional requirements
Generate 100M per day
100 bytes - long URL
90 bytes - long URL
10 bytes - short URL
100 bytes * 100M * 365 * 5 = 8 TB
100M * 365 * 5 = ~200,000M
Much heavier read than write
System Design
System design
NoSQL:
Key-value pair
Easier to scale, horizontal scale
Easy lookup
Range queries
Hint: Clarifying requirement, if the user types a short URL, the system should return the long URL
Hint: can create 2 tables, long URL maps to short URL, short URL maps to long URL
One write server, multiple read servers
External APIs
Two APIs:
POST API for generating short URL from long URL
GET API for translating short URL into long URL
Hint Assume short URL is 64 bits
Q: Two different long URLs may have the same hash collision (map to the same hash code)
How do you make sure we generate unique URLs?
A: when the hash function returns a value, we check the value in the DB. If the value exists in DB, then attach a random value to the end, and try again
Hash_function(timestamp + long URL)
First 10 characters, sliding window, until the value is unique
Q: Separate service for write and get?
A: yes, we have a lot writes than reads
Add LB to send traffic to read server
Q: is web server same as web browser?
Add DB
Q: let’s worry about the traffic
We can have a cache value to sort the most recently queried short URLs
Q: Is the cache a browser side cache?
A: yes
APIs,
write: POST /api/v1/generate
Original URL
Read: GET /api/v1/get_url/param short_url
Read: GET
https://generateurl.com/{short_url}
How does redirect works?
Hint: should return the right status code 301 for redirect
Q: Read node is 100 times more than the write node
Will you put 100 more read node?
Q: How does database handle read service? Each database have a limitation for handling read requests. How do you handle read requests?
A: we have more than 1 database. Depends on how many database we have, one database will handle the writes (master) and multiple databases will handle the reads (slaves)
We will use eventual consistency to propagate the mapping to the read replicas
Q: what do you want to add to make your system more robust?
A: add cache
Q: if cache space runs out, how do we handle it?
A: we can evict data from cache
Q: what else can you add to make it more robust?
A: _
Q: how do you avoid hackers to generate random short URL to attack the system? Your system will need to go to database to check.
A: we can add rate-limiter on the client
Algorithms: can use sliding window, or use token bucket to drop requests based on limit
Interviewer and Audience Feedback
Interviewer:
Requirement, communication: soft skill is good
Hard skill: understand basic building blocks, but can improve on how to use the building blocks
Interviewee:
Hard skill should be improved
==
Soft skill
Interviewer:
Alex Xu book gives a design pattern for each
Functional
Non functional
API design
High level design
Database design
Deep dive
API design jumped out in the middle
==
Audience:
Requirement gathering needs more improvement
If the user does not provide URL
Parameter may have its own
Malicious/porn URL, how do we handle these situations
API design
Came in too late
Jumped into scalability too early
==
Hard skills
Interviewer:
Load balancer and server
I was not sure if the box represent web browser or web server
URL preprocessing - may not be too important
Key points:
Must fill 1-1 mapping
Database: how to ensure consistency?
Cassandra, if you create 2 tables, how do we know these two tables are in sync?
We may crash between writing 1 table and writing another table
Cassandra can create 1 table + a materialized view.
It’s very important requirement
Cache location is not right
Interviewee: When you write checking cache. This is not ideal
We may need ID generator.
Timestamp + long URL, may still encounter hash collision
Base 64: there are some special characters, ? and = are used. Therefore we should use base 62
Rate limiter: is a bonus.
==
Audience
System design interview is 45 minutes
When we design, should we start with core function, then expand?
Core feature is long->short URL, short->long URL
Then we add scalability
Then we can add DDOS
Interviewer:
It’s very reasonable
===
Audience
System design interview - it feels quite reactive
Can interviewee drive the discussion?
Interviewer:
It’s right for interviewee to drive
I had to drill down because I wanted to figure out the knowledge
Audience
Perhaps experienced people will drive more
Interviewer:
L4: doesn’t need to drive strongly as long as can answer the questions
===
Audience
How to distinguish L4, L5?
Interviewer:
Guarantee database 1-1 mapping:
L4: two tables can work
L5: have to deep dive into guarantee of 1-1 mapping
====
Audience
How to enforce 1-1
Interviewer:
Primary key
Materialized view
2PC can be used to guarantee consistency between the 2 tables
Audience
How to ensure consistency between 2 tables
2PC
First make sure all party are ready to commit
Then commit the changes
==
Step 1: long URL insert to long URL, short URL table (using UUID generator)
Step 2: insert short URL + long URL into the second table
DynamoDB: can use global secondary index
Cassandra: secondary index is not recommended; materialized view is recommended
==
Audience:
Database writing. Should we handle data center going down?
Interviewer:
It will be a bonus if you can handle data center going down.
===
Audience
Cache: Only for reading
Audience
Expiration. TTL
Interviewer:
5 year expiration
===
Redis:
multi - similar to a transaction
Lua - can also implement transaction
Not using queue
===
MySQL requires sharding
100M write
1000 QPS