Distributed Resilient Storage in the Real World

System DesignDatabases & StorageDistributed Systems

Topic: Distributed Resilient Storage in the Real World

Presenter: Kai


Sign Up Form:

WeChat QRCode

Topic: Distributed resilient data storage

System design: IP blocker

Real world design: highly distributed authentication service

Container isolation: GVisor - Distributed system with container is slow

==

System design: IP Blocker

Senior/staff level

Country X implement a law which forbid a certain IPs to access Google’s services

Country X provide

3rd party handles 20k QPS

Service, 2M connections per second

Warm start with precomputed result

===

Missing point:

3rd party, only asked about throughput, but not latency and SLA. It’s often unavailable.

IPv4 4B addresses, can pre-build a cache of answer. It takes less than 3 days to fill the cache

For IPv6: can use the previous solution

What if we have multiple data centers and still one external service?

===

Control plane - configuration

Data plane

Key: cloud resource manager / cloud resource frontend

Similar to airflow, or uber cadence

User -> cloud resource manager: User requests a service to be allocated

Services in the private network don’t require authentication

Latency for replication: 5 minutes

Key problems in design:

Dataplane authentication: 70k-80k requests per second. How do we reduce latency?

How to detect data tampering

Improve service resilience

Key problems yet to be solved:

Big customers starving other customers

Geography boundary

Data plane authentication:

High throughput.

Low latency Need to be < 5ms

Solution: colocate auth service with (real) data service

Actual issue: an authentication request may need multiple backend

What happens if a permission is revoked?

User -> cloud service manager -> storage -> data change field -> data sync (may merge with data from other services) -> push to cache

Q: when there is auth data change, does the change propagate to all cache instances?

A: first filter by project, then push the snapshot

Data signing

Hacker may attack the storage

Authentication data may cause a lot of damage

Sign with private key

Verify with public key

Ensure all updates happens in secure machines - protects upstream

Embedding signing signature when writing happens

Public API for key metadata and clients to verify

Does not protect against upstream modifications

Protects against subsequent modifications

Distributed data store: 5 level of consistent level

Above eventual consistency, below session consistency

Distributed resilience storage: multi-tenant

GDPR

Shuffle sharding.

Big or malicious tenant may bring down service of other users

Cell-based architecture / shuffle sharding

Cell-based architecture - 5 users share the same instance.

Cache instance 2VCPU, 0.5 GB, under utilized

If increase # of users, the reliability will decrease

Container based isolation: Hard to be pay-as-you-go

Notebook service. To ensure isolation between enterprises, we used container

5-6 seconds to open jupyter for most other vendors

2-3 minutes for us

Can we use kubernetes fleet to support better isolation and faster resource isolation

Gvisor for isolation

Kata container

Distributed resilience storage: multi-tenant

Aliyun: run lambda on kubernetes fleet

If a client spans over multiple machines, then use runc

For small clients: use kata. Kata is slower