Newsfeed
Topic: Newsfeed
Interviewer: Li Ning
Interviewee: Fei
Level: L5 (Senior)
Topic
Mock System Design Interview Summary
Interview Overview
Date: 5/22/2022
Target level: L5
Duration: 45 minutes
Topic covered: Design Newsfeed
Drawing tool used: excalidraw
Requirements
Functional requirements
See newsfeed
Post newsfeed: text / image /video. Focus on text and image first
Non-requirement
login/logout
Follow users
Non functional requirements
Assume 100,000 users
2 tweets per day => .2 tweets per day
DAU 10,000 users
Stability / Reliability 7*24
10k * 2 = 20k tweets
10MB == 200 GB/day, image, text, video
200GB / 86400 = 2.4 MB / s
System Design
System design
MySQL + noSQL: image, text, video
MySQL: feed info, people’s posts, follower relationship, timeline
20% / 80%
Reads: 2.4 MB/s*4 = 10MB/s
QPS: 100k
Each user can read newsfeed for 5 times
User can follow 100k * 100 * 2 / 86400 = 240 QPS
13:00
Q: let’s focus on text services
Choosing mongoDB
JSON native, text file can be converted to JSON; can help us support rest API
For media files, we can use another database
Q: let’s focus on text messages first
When twitting: tweet service saves content to MongoDB; also talks to user service for auth
Read feed:
Use friendship service to find the followers to get new feeds
Q: what does friendship service do?
A: find out followers of users
Q: where do we save the follower relations?
A: MySQL DB.
Add mysql DB
Q: Table design for friendship service
A:
Users: userID, userName, email, phone, password
Following: from_userID, to_userID
Tweets: saved in mongo DB
De-couple services
Q: how to send a tweet? newsfeed service, friendship service
A: user services are for creating users. Friendship services are for relations
Q: why do we need to talk to user service and newsfeed service when sending a tweet?
A: follower will push data into database. Followee will pull information. Newsfeed service can act as a broker.
Q: back to basic requirements. Which services will be involved when a user publishes a tweet
News feed service -> tweet service
Q: why not talk to tweet service directly?
A: when don’t want to publish tween service. It’s a private service, not exposed externally
Q: how does a friend see the new tweet?
A: brute-force solution: first check friendship service to get all followers. Then talk to tweet service to retrieve the tweets of the followers. Newsfeed service can aggregate based on the timeline.
Q: How can newsfeed service get all newsfeed?
A: when can use friendship service to lookup relations, then tweet service will return from different followers, then tweet service can get the tweets
API design:
getTweets/Id=123/pageNo=1
postTweets/getTweets/page/No=1
{useId: {123, 124, 125}}
100 tweets * 3 = 300
Q: will you return all tweets? Which service will do pagination
A: newsfeed services, it sends pagination to tweet service to retrieve based on pagination
Q: what happens when we increase from 100,000 users to 100M users?
A: there may be bottleneck in different services
We can separate people using space distribution, dense distribution, inactive/active, lazy pull
We can fan out when people create new feeds, the feeds are fanned out to followers
A: We can notify followers when a new post is created
Q: for 100M users, are you going to use the tweet service to retrieve the tweets?
A: we can use some sharding, e.g. for celebrities, we can add cache. We can directly use some cache
We can use async service to persist the tweets in mongo db; we can use cache management to handle celebrities.
Q: what does notification service do?
A: newsfeed -> tweet service -> notify all followers
Notification service monitors the people you are following
Interviewer and Audience Feedback
Interviewer:
Interviewee did not ask for a complete functional requirement. Should clarify the requirement. Users can post. Users can see their own posts. User can see newsfeed
Too much time spent on estimation: low QPS. I wasn’t expecting a lot of time spent here.
High level design: can make the diagram more clear. Database design should include table names.
The most important is to discuss the tradeoff of push vs poll.
User service, validation. I didn’t care about this area. Should be more focused
Scaling system can wait till when we scale it 1000 times
Fanout - we didn’t discuss about it
Interviewee:
Architecture: I was interrupted
Newsfeed service, want to be a multi-tier service. Wanted to use this as coordinator
QPS, DAU design. I didn’t get the point of the interviewer. I thought it was too low and a trick question. Different from normal
Push vs poll, we didn’t discuss deeply. At the beginning it’s low QPS, we didn’t add cache or queue. So I tried to simplify at the beginning, so I was not set up to discuss the cache.
If the amount is small: we can just use database
Increasing QPS: we can add cache
Increase more: can add queue and cache
Should prepare for multiple tier of loads
Interviewer:
I hoped for basic architecture at the beginning. Then we can add a message queue and cache.
I provided low QPS, facilitating the basic architecture
Then adding QPS can lead to sharding, message queue and cache
Wanted to see a progression
===
What’s the best way to drive?
If we notice it’s very low QPS; we can quickly calculate the order of magnitude
If we think the calculation is important, we can calculate but we can refer back.
We can design with a single machine.
High qps => Distributed system
If we have very low QPS, we can just retrieve from database
If we have high QPS, then we improve the design
Push vs pull is based on the case. Business driven technical solution.
What are the key dimensions?
Normal user vs celebrity
Post service to create post
ViewPost: view my own post
Low number of users, we can reactively query the users followed, then retrieve the feed from database
High number of users
we can proactively compute the feed when post is created.
celebrity: too many followers. Add a flag in Postcache whether it’s a celebrity. No fanout for celebrity
Combine pre-created feed and celebrity posts
Notification service:
can push new posts to offline users
When a celebrity sends a post, the user receives a notification
Post service -> queue -> newsfeed service
Batch job to fanout
Fanout only to active user. We need to detect which users are active users. We can build a user cache to mark if they are active or not.
Major event, many celebrities all send post
All people try to poll at the same time, then we need to do a rate limit
Newsfeed cache: can be regenerated if it’s down.
We may need to reconstruct the cache from database
Or we can persist the result into database
Should we cover pros and cons of different DB and different cache? Vendor, SLA
Newsfeed ranking service:
Add some ads, or add weight to some people.