Distributed Key Value Store
Topic: Distributed Key Value Store
Interviewer: 好雨
Interviewee: 仪丽
Level: L4 (Experienced Individual Contributor)
Mock System Design Interview Summary
Interview Overview
Date: 2/13/2022
Target level: 4
Duration: 45 minutes
Topic covered: Key Value Store
Drawing tool used: miro.com
Requirements
Functional requirements
Create, delete, get, no partial update, batch update
Key: string: value: string
Non requirements:
No need for transaction (ACID)
No sorted result needed
Non-functional requirements
Update triggers
10TB / year
Value = 1kb
QPS: 100k QPS
Latency 50ms
Durability
Eventual consistency -> optional strong consistency
Availability? Fault tolerant, no single point of failure, 99.99%
Scalability
Write intensive rather than read intensive
Non functional requirements
System Design
External APIs
System design
Config server:
Metadata, log
JSON serialization format
Request Manager -> config server: who owns the data
Request Manager -> communicate with actual store
JSON vs other?
JSON vs redis(?)
JSON is easier to use for frontend
What protocol to use?
A: HTTP: widely accepted
TCP is for internal use
Request manager -> config server call, is it for every request?
There can be a cache before the config server
Client vs server side cache?
Server side
Every request needs to hit config cache
Keep-alive:
The config server knows primary server is up or not, automatically switch to secondary if primary is down
Single point of failure
Data migration can use instruction from config server
How to handle config server failures?
Can make a backup for config server
How do you make sure there is not 2 active config server
Use sentinel to understand which one is the active config server
How do you support high concurrent writes?
Storage engine levelDB
Some data engine supports write intensive
Why they can support high write volume?
LSM tree as data structure. Supports write-heavy workloads
Why is it fast?
LSM tree: sorted string tables, divide data as segment
Can store key-value in each segment
(audience:
immutable memtable
Then sstable
Then compact
)
How to handle growing data?
Consistent hash
When client request data
Mapping table: node -> information of node
If one node fails, it will put the key value store to other nodes
Config server:
Contains mapping table
2 columns:
Data server info (ip address)
ID of node (e.g. A)
Where to store backup servers?
Also store in config server
Config server (mapping table, user ID, version number)
How is config cache updated?
If new node is added or removed, the config service will regenerate the mapping table
Config service will sync the data to the new cache
Eventual consistency -> strong consistency How to support it?
3 nodes for key-value store. 2 of them changed then we can send success to the client
Primary has the latest data, and secondary has some stale data
RAFT can help
Interviewer: can just route request to primary
When requests update the same key, how do we handle concurrency?
Can store a version number on key-value store
Version 10
Party A: Version 10->version 11
Party B: Version 10->version 11 (rejected)
Optimistic locking
Q: How do we use pessimistic lock?
A: I prefer version based (optimistic lock)
LevelDB:
Write data to memory, flush to disk periodically
Server goes down after writing to memory? How to ensure durability?
Durability is not an issue
Interviewer and Audience Feedback
Interviewer L4:
P2p - more complex
Choose master-slave is the right choice
The solution is workable
Things to improve:
Rushed the technical decision
But we can discuss the tradeoff ahead of the decision
Sometimes interviewee can dive deeper
LSM tree - is the right direction, can go deeper. No very clear during deeper dive
High level is good
Write heavy system?
Storage engine decides high write vs high read
Key value store: B+ tree (favoring read), LSM tree (favoring write)
I asked but deep dive
Audience:
Interviewee should drive requirement gathering
Candidate: encryption. We can say “we don’t need encryption”
If interviewee has not dived deep enough, then the interviewer can drive harder
Audience:
Distributed key value store, what is the emphasis
For example, single node CRUD, read,write intensive, then extend to distributed
The interviewee did not cover the single node
Depends the requirement of the interviewer
Audience
QPS: real data are added every year
1k, write QPS 300 or so. Not write heavy
17 TB
Write and update data
10 TB: how this is defined
100k QPS
On average 1k Byte
100k + key value size
100k write or read QPS?
Interviewer
Read write ratio, QPS will be more clear
Audience
100k QPS 10TB
Interviewee can calculate
Audience
Who should provide the numbers?
Audience
Write heavy, write intensive, distributed system
What database to use?
LevelDB LSM tree
Can system level - write heavy
Audience
Can add a queue, then put data from queue to database
The tradeoff we will get dirty data
Audience
Throughput of levelDB?
Read: B+ tree
LevelDB is a library. LSM tree
RothDB also uses levelDB
innoDB
Audience
Key value store
Default very high scale
First discuss a single node
Surprised that we directly levelDB? Should we design the deeper level
Interviewer
How to cover scalability, durability etc?
Audience
The interviewer asked about single machine design first, then move to scale the system
Key value store, why is it better than traditional database?
Because in a single machine writing is much faster
Audience
Level DB, how to design
Normal interview may not dive so deep
It seems online videos have similar flow (single -> multiple devices)
Audience
Existing solution name -> then internal dive
Audience
First list the requirement
Audience
Overlaps quite a bit with existing solutions
Audience
The key point is if the interviewer knows existing solutions
Audience
Bigtable - much more complex, tear (taobao),
We can choose a simple one, and easy to explain.
Audience
L4: lower layer is better, or application layer is better?
Such as appointment application
Which one is more suitable for L4
Audience
Which points do we want to deep dive into?
Want to ask about non-functional?
Key value p2p vs master slave. How to prevent master single point of failure
Request - client can directly talk to data server. I was hoping config server is moved to the client.
Need a sentinel (哨兵) for monitoring and switch between master and slave
(a) Request does not need to go through master
(b) how to do master failover
I think the config server is the master
Config server is a single point of failure?
Client can cache the configuration, reduces load on config server
Switch over
Config server can be a cluster
Primary DB - etcd . lots of components and assumptions
ADG is sharding. B, E, H is backup
Primary is already sharding
Write volume is big: how to scale: sharding
Metrics, master down, and slave is up, need to alert and monitor
Configuration cache can go to the client side
When we need to add machine, we can predict the growth - can put it to the client side
Config/data server heartbeat
mapping/hashing
Outdated mapping. Go back to config service to get the up-to-date mapping
Audience:
Request manager can be merged with config server?
Request manager is fine. You may not be able to add logic to load balancer
====