Real time message system

System DesignReal-time & ChatMessaging & Streaming

Materials — open to everyone, no sign-in

Topic: Real time message system

Interviewer: 刘君洁(liujunjie034661)

Interviewee: LW

Level: L4 (Experienced Individual Contributor)


QR code

Sign up for future events. Presenters for system design appreciated.

Sign up for 1-1 coaching:

https://commitway.com/store

WGROUP_08G 八月底前获取30%优惠

Mock System Design Interview Summary

Interview Overview

Date: 8/14/2022

Target level: L4 (experienced IC)

Duration: 45 minutes

Topic covered: Messaging system for users

Drawing tool used: Excalidraw

Requirements

Functional requirements

1-1 chat

Need to maintain history

Interviewee: 200 vs 200,000 concurrent connections?

Interviewer: what’s the difference?

Interviewee: 10M-100M will make

Excalidraw

Non functional requirements

System Design

External APIs

send(userId, content)

receive(userId, content)

System design

Choosing NoSQL for scalability

Chat sent and saved in database

User gets online, the message is received from NoSQL

Protocol used:

Polling. Ask for message periodically. Expensive if there are no messages.

Long polling: wait for the message when there is no response.

Q: why is there high overhead for long polling?

A: each round trip requires HTTP handshake

Websocket: bidirectional communication, low overhead.

Q: How does a user subscribe to the messages that it needs to receive?

A: each user has its own message queue topic

Q: which message queue do you like to use?

A: kafka: can create a topic for each user. robust fail over and retry

Q: how do you scale up message queue?

A: for kafka, scalability is simple. Each chat server will be a publisher, and a receiver. We can add publisher and subscriber.

Q: chat server or user subscribing to the message queue?

A: chat server. THis can help us handle additional security requirements

Q: how many websockets each user needs to maintain with the server

A: one websocket per server. Bi-directional connections

Q: user A sends a message to user B

A: first it goes to database, then it goes to the message queue user B is subscribing to

Q: how does chat server know to poll the message

A: use long polling

Q: how many chat servers do you want to maintain? What is the constraint for the chat server to handle many users?

A: the bottleneck is in the middleware. Kafka can scale up easily.

Q: let’s say there are 100 chat servers. How do you make sure the server pulls the message is connected to the right user?

A: when user opens websocket to chat server, the chat server subscribes the topic in the queue

Q: how to design the system to send the right message from the queue to the right user

A: there is a local mapping between message topics and the websocket connection

Q: talk about the database

A: no relations. NoSQL is more scalable.

messageID, content, sender, receiver, timestamp

Q: what database to use?

A: we can use cassandra

Q: what is the row ID?

A: we don’t need a row ID.

Q: it’s a column based database. Do you need a row ID? What are the column names

A: column names: messageID, content, sender, receiver, timestamp

Q: should we maintain a history?

A: messageID should be incrementing. The partition key is senderID. New messages should have a larger message ID value than older messages.

Q: if I want to read the historical messages from one friend, do I need to query a different partition?

A: sender ID or receiver ID can be the partition key

Online users rely on message queue. So database is write heavy

Q: how to maintain the message order?

A: can use incrementing message ID

Q: what if the messages are too close?

A: just use the receival time sequence

Q: if there is some network issue, how do we maintain order?

A: if there are 2 messages that are really close, we still use the timestamp of the receival.

Scalability / availability

Q: how to avoid single point of failure?

A: chat servers are replicated.

Message queue: kafka provides

NoSQL: different replicas in different data centers

Chat servers: we have different instances. Can bring down using zookeeper. User can connect to a different server

Q: zookeeper for availability of chat server?

A: we can use it for balancing loads between different chat servers

Q: what load balancing algorithms?

A: consistent hashing of the user ID. If the user ID is random, there should not be a hotspot.

Q: why use consistent hashing?

A: this maps servers to users. We just want to make sure the mapping is random

Q: what complexity for consistent hashing?

A: when new server joins, we need to balance the consistent hashing.

Q: how to handle chat server leaving and joining the ring? A: heartbeat.

Q: when will it rejoin to leave the cluster?

A: when missing heartbeat for 30 seconds

Q: will it cause problem in consistent hashing?

A: randomize. If one server drops, other servers will take the load evenly

Q: message queue between chat server and noSQL database.

Ordering

Interviewer and Audience Feedback

Audience scores.

Interviewer:

Websocket: is rather difficult. Harder than one socket. Didn’t get solid answer to find which server has the websocket connection

I wonder if there are real experience using

Kafka is not very good. Consumer is by partition. Kafka is more suitable for offline, not online.

RabbitMQ is very good. Kafka streaming is good. Each partition is only consumed by one consumer.

Consumer: RabbitMQ支持的 TOPIC 比 KAFKA多。。其他好像差不多。。 九章说的。

Interviewee

Seems to be fine

I didn’t get

===

Which message queue

Not the main point - RabbitMQ

Interviewer:

Websocket server

Client connects to a specific server. There are multiple websocket servers. Even if each websocket can handle 1M connections, we need 2.

Using Redis to record which client is on which webserver

Audience: Why do we need to require mapping?

Interviewer: how do we find the right websocket server

Interviewer: Do you create a topic for each user?

Alternative design:

Audience: can we not use message queue

Interviewer: MQ: buffering, decoupling

Audience: if the receiving user is not available, we may need to deliver

Interview: Cassandra schema?

Audience: is Cassandra column based database

Interview: row key, column key

Column key is flexible

rowKey: threadID

audience:

userID, sender or receiver

userID, sender or receiver

A composite key userID1, userID2

Should duplicate the message twice

Interviewer:

Primary key = userID1, userID2 sorted order

Audience:

ThreadID is fine

Receiver is the groupID

Interviewer:

How do we order the messages?

Everybody may have a different location

Audience: Cassandra can use transaction of 2 records

Scott Shi interview

Audience: Cassandra can record the last time the user read the messages.

Interviewer: How do you maintain ordering?

Audience: we can just use the receival time. Wrong order at edge case.

Interviewer: version stamp

Vector version timestamp.

Logical version stamp

Kafka

===

Can kafka dynamically add topics?

Usually use 1 topic per kafka

We can use redis pub/sub to create many topics

Easy to create and subscribe topic

RabbitMQ can dynamically create topics

===

Audience: use NoSQL. No need for delete. Is scalable.

===

Recommend reading: https://youtu.be/6w6E_B55p0E