Distributed Cache · commitway

Topic: Distributed Cache

Presenter: Pauline

QRCodes

工作人员 wechatID: dailylearnit 分布式系统讨论群

System Design Summary

[80]

Data are saved in DB

Accelerate lookup

Common scenarios:

Authentication service

Friendship

User tables

Which operation is more common?

Registration

Query

Update user info

Query is most demanding

QPS

MySQL/PostgresSQL SQL: 1k QPS

MongoDB/Cassandra NoSQL: handle 10k QPS

Redis/Memcached etc memory NoSQL: handle 100k ~ 1M QPS

Question:

Which DB is preferred:

Registration, login, modification 300 QPS: mysql

Query user info

Custom system characteristics

Lots of read, fewer write

In such situation we can optimize with cache

What is cache:

Like hashmap, but more than it, such as expiration

Key-value

Memcached does not support persistence

Redis supports persistence

Cache

Not always in memory. E.g. file and CPU have cache

Not always on the server. THere can be client cache

Memcached

Application -> get data from cache

Memcached optimizes DB Query

A-C may generate dirty data

Dirty means inconsistent data between cache and database

Think the second operation crashes

A: db.set(user); cache.set(key, user);

B: db.set(user); cache.delete(key);

C: cache.set(key, user); db.set(user);

How about this case?

D. cache.delete(key); db.set(user)

In a multi-thread/multi-process environment, case D will still create dirty data

How to avoid inconsistency between DB and cache?

Lock? Not practical. May decrease the efficiency

Best solution

First set then delete. db.set(key,customer); cache.delete(key)

Hit rate is usually high. (98%)

The chance of failure at delete is low.

Set TTL, e.g. 7 days

Audience:

Why is cache.delete() -> db.set() worse than db.set() -> cache.delete() ?

If there are lots of writes?

If everytime the database

Caching strategy

Cache aside vs cache through

Cache aside: server communicate with DB and cache separately. DB and cache is not connected directly

Cache through

Server only talks to cache. Cache talks to DB.

Redis uses this way.

Redis supports key-value. Cannot handle complex data structure

Industry prefers cache aside.

===

Authentication service

Session scenarios

session table to store the personal information of the user

Session key, user_id, expired_at

Browse stores the session_key in cookie

Some servers allows single login

Some servers support multi-login. One user can have multiple session_ids

===

Friendship service

One way friendship

Friendship table: from_user_id, to_user_id

Two way friendship

One record for each friendship pair:

Small_user_id, Big_user_id

Search xxx from friendship where smaller_user_id = xxx or big_user_id = xxx

SQL can support multiple index

Two records for each friendship pair:

From_user_id, To_user_id

2 is usually preferred due to speed

===

NoSQL cassandra

Scalable, fault-tolerant and consistent

Column-oriented database

Based on dynamo and google big table

Replica based on dynamo design

Cassandra key = row_key + column_key

Cell-based structure

Row key

It is hash key, partition key, sharding key

Column key

Column key can be ordered

Insert(row_key, column_key, value)

Query(row_key, column_start, column_end)

====

RDBMS vs Cassandra

RDBMS: deals with structured data

Cassandra: deals with unstructured data

=====

Friendship storage in cassandra

How to store tweet in cassandra

SQL vs NoSQL

Which DB is suitable for friendship?

Both are fine

SQL: structured, build index easily

NoSQL: distributed, auto-scale, replica

DB for user table

Most cases choose SQL

Trust worthy, multi-index

DB for friendship table

Most cases choose NoSQL

How to model extended friendship?

One way friendship

Redis: key = user_id, value=set of friend_user_id

Cassandra: row_key = user_id, column_key = friends_id

Two way friendship

Find friend A, key-value

Find friend B, key-value

Find intersection, store in cache/memory

Extended friendship

BFS?

Find friends not via three persons

Users > 100M

Average number of friends: 1000

Expect DB query number [< 20]