Design Realtime Comment for Newsfeed

System DesignReal-time & ChatRecommendation & Feed

Materials — open to everyone, no sign-in

Topic: Design Realtime Comment for Newsfeed

Interviewer: ken

Interviewee: ying

Level: L4 (Experienced Individual Contributor)

Additional Resources:


System Design Interview - Design Realtime Comments

Join Us on Wechat

Subscribe to Our YouTube channel https://commitway.com/eventyoutube

===

[7:00]

Requirements

Realtime facebook comment

Comment below the post

1 level comment

optional: privileged comment

MAU: 10^9

10^8 post per minute

6*10^5 comments per minute

[12:40 - 7:00]

API design

Publish_comment(uid, …)

View_comment(post_id, …): get comment on a post

View_comment_by_user (uid, self_uid, auth_token): get comments made by one user

[15:00-7:00]

1 high availability 99.999%

2 resilience on comment published

3 scale well

4 No need for it to be linearizable. No need for transaction. Eventual consistency

Additional comment

  1. Deliver in real time (1 second)

[19:16-7:00]

QPS: 6*10^5/60 = 10^4 QPS

Comment per day: 610^56024= 1.0810^10 comment

Q: multimedia?

A: text only

1kb per comment: 1.08*10^1 TB per day ~10TB per day

Assumer 50% compression rate: 5TB per day

[24:00-7:00]

High level design

Realtime DB

Leader-follower db can work

Multi-level db works

Quorum-based: dynamo/cassandra can work

[28:44-7]

Q: SQL/NoSQL?

A: when we look at the post, we want to look at comments as a whole. SQL: hard to support locality

use NoSQL: document style or column style

Comment in post table

Key: PID, UID, post content, value: Json: {[comment1, comment2, etc}]

[appending is difficult]

Comment by user: key: ui, value: Json: {[comment1, comment2, etc]}

What database to use?

Memcached, redis

1kb to several megabye per comment

Like to find a NoSQL database that supports a few megabytes

2 users comment at the same time

Version vector system to resolve conflicts

V1 [comment1, comment2]

V1->V2 [comment1, comment2, comment3] -> V2

V1->V2 [comment1, comment2, comment4]

Comment1, comment2, comment3, comment4

[optimist lock?]

Do we keep all versions?

Delete old version when versions are combined

How do we deliver comment in real time?

Do another quorum read

39:10-7

User keeps a socket with the server

Put comments into a stream.

Or polling

Kafka stream.

Publish comments into the stream

Reading the stream: 1 assign back to comment db

Track which user is reading which post

Save comment in in-memory datastore

Realtime = 1 second

Set up kafka stream, and server behind it. Guarantees the fastest response

NoSQL speed should be fast enough

Polling for user.

Pushing can achieve faster speed

Choose polling not pushing

Prefer polling

[ may not be realtime ]

Q: Each active user reads db every 3 seconds. What is the read throughput we need to support?

A: if client is an app.

Calculate throughput

10^9 users * 0.5 * 3 posts/user = 4.5 * 10^9 read per second

Scale by partitioning onto 1000-2000 servers

4.5*10^6 per minute

4.5 billion reads per second total

4.5 million reads per second per server

Redis 100,000-200,000 QPS

Use in-memory database to serve at high speed

Polling is easy to implement but the required throughput is very high

====

Post content

Comment content

Who is reading which post

Writer: everybody

Reader: people who submits comment

Write globally, read locally: A, B

Write locally, read globally: C

1B comment reader per day

10M comment writer per day

A -> C (write locally 1 Billion users, read globally 10M user) -> B (write globally 10M users, read locally)

===