Distributed Cache
Topic: Distributed Cache
Presenter: Pauline
Sign Up Form:
QRCodes
工作人员 wechatID: dailylearnit 分布式系统讨论群
System Design Summary
[80]
Data are saved in DB
Accelerate lookup
Common scenarios:
Authentication service
Friendship
User tables
Which operation is more common?
Registration
Login
Query
Update user info
Query is most demanding
QPS
MySQL/PostgresSQL SQL: 1k QPS
MongoDB/Cassandra NoSQL: handle 10k QPS
Redis/Memcached etc memory NoSQL: handle 100k ~ 1M QPS
Question:
Which DB is preferred:
Registration, login, modification 300 QPS: mysql
Query user info
Custom system characteristics
Lots of read, fewer write
In such situation we can optimize with cache
What is cache:
Like hashmap, but more than it, such as expiration
Key-value
Memcached does not support persistence
Redis supports persistence
Cache
Not always in memory. E.g. file and CPU have cache
Not always on the server. THere can be client cache
Memcached
Application -> get data from cache
Memcached optimizes DB Query
A-C may generate dirty data
Dirty means inconsistent data between cache and database
Think the second operation crashes
A: db.set(user); cache.set(key, user);
B: db.set(user); cache.delete(key);
C: cache.set(key, user); db.set(user);
How about this case?
D. cache.delete(key); db.set(user)
In a multi-thread/multi-process environment, case D will still create dirty data
How to avoid inconsistency between DB and cache?
Lock? Not practical. May decrease the efficiency
Best solution
First set then delete. db.set(key,customer); cache.delete(key)
Hit rate is usually high. (98%)
The chance of failure at delete is low.
Set TTL, e.g. 7 days
Audience:
Why is cache.delete() -> db.set() worse than db.set() -> cache.delete() ?
If there are lots of writes?
If everytime the database
Caching strategy
Cache aside vs cache through
Cache aside: server communicate with DB and cache separately. DB and cache is not connected directly
Cache through
Server only talks to cache. Cache talks to DB.
Redis uses this way.
Redis supports key-value. Cannot handle complex data structure
Industry prefers cache aside.
===
Authentication service
Login, session, cookie
Session scenarios
session table to store the personal information of the user
Session key, user_id, expired_at
Browse stores the session_key in cookie
Some servers allows single login
Some servers support multi-login. One user can have multiple session_ids
===
Friendship service
One way friendship
Friendship table: from_user_id, to_user_id
Two way friendship
One record for each friendship pair:
Small_user_id, Big_user_id
Search xxx from friendship where smaller_user_id = xxx or big_user_id = xxx
SQL can support multiple index
Two records for each friendship pair:
From_user_id, To_user_id
2 is usually preferred due to speed
===
NoSQL cassandra
Scalable, fault-tolerant and consistent
Column-oriented database
Based on dynamo and google big table
Replica based on dynamo design
Cassandra key = row_key + column_key
Cell-based structure
Row key
It is hash key, partition key, sharding key
Column key
Column key can be ordered
Insert(row_key, column_key, value)
Query(row_key, column_start, column_end)
====
RDBMS vs Cassandra
RDBMS: deals with structured data
Cassandra: deals with unstructured data
=====
Friendship storage in cassandra
How to store tweet in cassandra
SQL vs NoSQL
Which DB is suitable for friendship?
Both are fine
SQL: structured, build index easily
NoSQL: distributed, auto-scale, replica
DB for user table
Most cases choose SQL
Trust worthy, multi-index
DB for friendship table
Most cases choose NoSQL
==
How to model extended friendship?
One way friendship
Redis: key = user_id, value=set of friend_user_id
Cassandra: row_key = user_id, column_key = friends_id
Two way friendship
Find friend A, key-value
Find friend B, key-value
Find intersection, store in cache/memory
Extended friendship
BFS?
Find friends not via three persons
Users > 100M
Average number of friends: 1000
Expect DB query number [< 20]