Cloud Storage Service
Topic: Cloud Storage Service
Interviewer: Ken
Interviewee: 乔磊
Level: L5 (Senior)
Sign Up Form:
Job referral
https://commitway.com/job-refer
QRCode
Mock System Design Interview Summary
[10-15] requirement gathering
[longer portion] design
[10-15] drill down
Basic requirements
[43:00]
Functional [gathered]
Support directories
Upload files
Download files
File sync - multiple clients
Out of scope
Permission
File sharing
Notification
[41:47]
Scale
File < 1 GB
50M signed up users. 10M DAU [ gathered]
10 GB free space
Upload 2 files per day. Average size 500 KB [not gathered]
1:1 read to write ratio
Average 1000 files per user
My Estimates:
QPS [not computed]
50M * 10GB = 500 PB
QPS for upload: 10 million * 2 uploads / 10^5 = 10 * 10 * 2 = 200 QPS
Peak QPS = 200 QPS * 5 = 1000 QPS
Metadata DB storage
10GB / 500KB = 20,000 files
1000 files * 10M users = 10 billion files entries
File path, s3 path, user, date,
10 billion files * 200 bytes of metadata per file = 2 TB
Bandwidth
200 uploads per second * .5MB files per upload = 100 MB per second
Non functional
Durable [99.99999%] .01% 3 times data replica 99.99999%
Availability [ 99.999%] 5 minutes per year
Sync quickly
Minimize bandwidth
Scalable
Highly available
[37:30]
Points to cover:
API:
list, upload, download, uploadChunk, downloadChunk
Client notification when version updated on server. Tradeoffs of poll vs push
Database Schema
Architecture choices:
If using S3/Google cloud storage: does the traffic go through the application server or not?
Push vs poll for propagating changes
storage choices:
cache
database: SQL (mySql) vs NoSQL (cassandra, eventual consistency)
File storage: Amazon S3, Google cloud storage, HDFS
Bar raiser:
Familiar with Amazon S3, Google cloud storage or HDFS workflow
Tiered storage to save storage cost
Soft Skills
Requirement gathering
Discuss tradeoffs
Clear presentation
Driving interview
Hard skills
Design quality
Compare existing solutions
Fit into larger context of project and product lifecycle
[37:00]
Back of envelope estimation
10M user * 1000M data = 10,000 TB
1GB per year, 100MB per month
10PB of data per year
[self implement]
.01 failure rate for disk (SSD)
Compute nodes, disk farm, leverage cloud service
S3 service - reduce complexity
S3 as my main storage system
To investigate different providers
[33:50]
[32:36 - good timing, starting design]
Starting high level design
[IAM - auth]
Add API gateway, encryption, LB
Normal files
Small, large, medium files
Try to handle large and small files
[104]
[29:03]
May want to define APIs
Adding queue service
[I don’t understand the queue service]
Monitors the queue system
[what protocol is for the message queue service]
APIs
[ I feel fine with defining API later ]
Create, delete,
/
/folder/object.txt (Get) - Download
/folder/object.txt (PUT) - upload
Multiparts upload api (PUT) - not using object names, parts ID.
[Question version control]
No version control needed
[ may think of chunking ]
[ may want a way ]
[ long running channel, can always have a connection with the backend service]
Write or change is limited
Most of the time there is no change
First time that I install the app, a full sync
[need a list directory API?]
/fullsync
[may think of system hook for accessing file, and download at that time]
[20:24]
[109]
Message content: (user id, file name, delete / new file)
[should optimize with checksum]
API service talks to the message queue
[metadata database missing?]
Client app monitors folder
Detects change
Client app keep the file in memory list
Client app sends files to API service
API service sends file to storage service
Optimize -> send multiple
API service generate message to message queue
Message content: user ID, file name, delete/new file
[What if API service crashes?]
Message content: user id, file name, delete /new file, sender
Q: message queue: multiple topics?
A: add API to list files?
Add API to get delta
Can use different attribute to do the filter
Can filter the messages to
[prefer message queue]
Messages to filter by userID
Use API to get delta:
Get all files. Compare with local files
Messages:
Is more clean to compute delta
[8:53]
RabbitMQ
Can handle large amount of data
Can handle the scenario
S3 as actual storage
May prefer cloud queue system
Filter by ID
[7:04]
Q: walk through 2nd client?
A: listen to changes. delete/new file
Issue request to API service to get the storage
[chunking missing]
[4:15]
Reliable
[106]
API service: stateless service - data is saved in memory - cluster of server with high compute and memory capacity
We can scale up API service
[2:20]
High CPU/memory
Need a database - store all meta data, which file belongs to who
Virtual folder structure
[should draw the database]
Read/write split for database
One write database master and several read database
[did not complete]