Distributed Key-Value Store

System DesignDatabases & StorageDistributed Systems

Materials — open to everyone, no sign-in

Topic: Distributed Key-Value Store

Interviewer: ken

Level: L5 (Senior)

Additional Resources:


System Design Interview - Distributed Key Value Store

8/6/2024

YouTube for the event:

https://commitway.com

| | | 职场提升俱乐部 |

[45]

[42]

[ Functional requirements

Design the next generation of distributed KV store that will support services:

authentication, authorization, messaging, etc

Won’t be used to support payment

Range query

Technical implications: B+ tree more ideal than hashtables

Eventual consistency

Technical implication: no need for transactions

Data type: string ]

[ Scaling requirements

5 PB

K/V Scale linearly

Multi data center after 5 years

Latency: Users should have real time experience, <= .5 seconds

Availability: 99.9%

Robustness: user data should not get lost

Technical implications: how to ensure multiple replicas

read/write ratio = 99/1

Technical implication: B+ tree is better than LSM tree

CAP: Availability > Partition > Consistency

Non-SQL

DNS, TCP load balancer

[36]

Partition - using hash

[34:05]

[partition by hash or by range?]

3 replica

Partition:

Based on a hash method

We may support multi-tenancy

[30:45]

3 partition

Q: What’s supported by the zookeeper?

Socket number, TCP server

Primary selection - some machine is down

Q: How to support range query?

Send it to all partitions

Calculate - if the keys are sequential

[may want to clarify the key type]

[25:25]

Q: Alternative?

Calculate the min and max key

[21:31]

Q: how does one node process the range query

Individual nodes

Cache data

Hash

Bloomfilter if the key exists or not

Rocksdb: load the page

[want to understand B+ tree vs LSM tree]

Rocksdb: B+ tree, write ahead log, a big hashtable/b+ tree table

Read-write ratio: to select which

Read » write : rocksdb , ssd, sequential read and sequential write, not in place-write. Append write, compaction

Write » Read: may not rocksdb. Local store

[14:45]

Q: ZooKeeper?

Generation and sequence concept

Primary switch, new generation

Based on generation/sequence number

Which generation/sequence is the biggest, is most the primary

Subscribe to primary

ZooKeeper storage backup

Self status: secondary/primary, currentstatus, primarystatus

[7:30]

[6:41]

meta service

Metadata ping

All meta to proxy service

====

Interviewee:

High level architecture

Range query - fan out

Detail - primary selection - process - not very well