Multi-user Chat

System DesignReal-time & Chat

Topic: Multi-user Chat

Interviewer: Yi

Interviewee: Anna

Level: L4 (Experienced Individual Contributor)


Multi-user Chat

Mock System Design Interview Summary

Interview Overview

Date: 11/7/2021

Target level: High L4 - L5

Duration: 1 hour

Topic covered:

Drawing tool used: excalidraw.com

Start 7:13

Requirements

Functional requirements

Multi-chat system similar to wechat

Group and 1-1 chat

Adding / removing members

Public group notice - group profile page (different from chat)

Group chat - messages in the chat. Show chat message

Max group members for group chat. 500

Support text: 100k characters in , photos, files

Notifications seen if user has logged in. regardless of whether the app is running in the foreground or background

Can one user log into 2 devices? One user can only use one device. 2nd device will kick out the first device

Want to chat all text messages. If user switches device, they should get older messages.

Non functional requirements

Scalability: daily actives: 10M users. Total 50M users

Latency tolerance. Low latency (10 ms)

Fault tolerance. Same message more than once? No duplicate allowed.

System Design

System design diagram

Send message:

Initial design

Interviewee added a message queue component in the middle:

Kafka can be used as a storage

Or, we can store the messages into the storage

Message service:

receive message

store message to dB

send to the right topic

Structured database. SQL database can work

Attachment can be saved to s3 or other storage, and SQL database can have a pointer to storage system

Q from interviewer: how do we notify the user when there is an incoming message?

Adding a “receive service”

Each receiver have a list of message queue consumer

Each MQ consumer will consume one topic

(Each receiving user has a consumer in the receive service)

Assume there is already an existing push notification framework to deliver the messages.

[note taker question: consumer 2 pushes to U1, U2, U3, but what if U2 is offline, or if U4 join in the future?]

Q from interviewer: how to handle group membership?

Add a “group manager service”

Database schema:

User: userID, userMeta

Group: list, groupHost(UserId), groupNotice

Message: server(UserId), groupId, receiver(UserId), MessageDetail, attachment, timestamp

Additional design

Discussions during the Interview

Interviewer and Audience Feedback after the Interview

Interviewer

Requirement: looks like very detailed

Rough design diagram

Suggestion:

More detail design for connections between components

Can look at high level first

From user -> chat service -> DB

How do we maintain it?

Authentication

What are the protocols?

Web socket, subscription

Most request uses HTTP

However if we need to maintain a subscription, then we need a special connection, e.g. web socket

Polling may be brute force, but if we maintain a websocket, then we will maintain this connection. Will be more efficient

DB: what choice of DB?

NoSQL may be more suitable

Reason: it’s simple key-value pair. Can scale well.

If we need more features such as emoji or voice, will be easier to be on NoSQL

Push notification:

I wanted to ask about pushing, authentication

User ID and push notification ID

How does the message

Push notification system may provide a token for the user to receive push notification.

User ID and token needs to be mapped together.

Audience

SQL database is better than NoSQL

In user table, need to add a list which user belongs to which group

It’s a relational database

Every group has a topic

Group ID is topic ID

Attachment: can use object storage 对象存储

Client side: some historical messages on the client

How long do we need to keep history?

Can save on device. On disk optimization.

One way is all saved into local

Alternatively, Slack: everything saved on server

Interviewer

User/group related: SQL

Firebase storage

Audience:

Need a membership table

User, Group

Membership (user ID to group ID, timestamp)

Interviewee:

Kafka: infinite topics, but partition has limit

Each topic may have 1-3 partitions

Total partitions has an upper limit

10 topics

50 partitions

Topics * partition/topics

Audience:

Web socket

SSE, websocket, long polling

Audience

Not easy to query

If it’s only for chat room, that’s good

Redis message queue

Firebase cloud messaging

Audience

Firebase

Message queue

Caching

Audience

Message queue: requires ordering avoid 答非所问

Every group requires a message queue: need 3 seconds

Distributed message ID.

Partition message ID generator

Each partition can guarantee monotonic ID

Kafka: group ID is a key. Within same partition

Kafka does not scale well with lots ID

Retention period

Timestamp at DB

There may be clock skew between different machines that support DB

Group ID shard to a ID generator

Audience

1,2,3 occasionally may be out of order

Websocket can guarantee order?

1,2,3 in the same group

1Q

2A

Try 45 minutes