Machine learning for Facebook Newsfeed
Topic: Machine learning for Facebook Newsfeed
Interviewer: 张小南
Interviewee: 恰恰
Level: L5 (Senior)
Topic
Mock System Design Interview Summary
Interview Overview
Date: 4/17/2022
Target level: L5
Duration: 45 minutes
Topic covered: ML for FB newsfeed
Drawing tool used: excalidraw.com
Requirements
Functional requirements
FB newsfeed
1 no queries
2 no ads
Problem formation
Improve relevance by recommending different content: images, videos, texts, lives
Candidate generation: return relevant items
Data and metrics
Labels: clicks, likes, comments, shares, reposts
De-nosie of label
Offline metrics: mar@k, auc; map@k, f1, mrr
Online metrics: # of clicks/comments.shares/reposts per user session
Feature engineering + pre_processing + multi-modal database
User feature: user ID, preference/tags, user embedding
Author feature: author ID, preference/tags, embedding
Newsfeed feature: newsfeed id, content of newsfeed (text, image, video ), engagement of newsfeed over past 1 hour, 24 hours, 7 days
Context feature: time of the day, day of week, device, holidays
User newsfeed cross feature: similarity between user and newsfeed
User author cross feature
Two tower model: dense model to create embedding. Minimize loss: positive pair of user and item should be ranked higher than unrelated pairs
Q: why use ID as feature?
A: they are treated as categorical features. User ID contains implicit information, when you embed, you can learn a lot of semantic meaning.
Model architecture possibilities
Candidate generation: two-tower model
Ranking:
Logistic regression. Cannot process categorical features well.
GDBT/neural net for preprocessing, and input it as logistic regression
wide and deep model: losing sequence information
deep interest model. Can improve wide and deep, and preserve sequence information
Caveats: positional bias, diversity, cold start problem
Probability of clicking: relevance of the item, and rank of the position
We can include position as variable + interaction with position + device. During inference, we can set it to zero.
Or through inverse perplexity.
Diversity. Multiple item from same author we can downrank some of them.
Maximum marginal relevance: for a candidate, if it’s highly relevant to the user, we will give it a boost; if it’s super relevant to already-displayed candidate, then we will give it a downboost
Q: Why is diversity important?
A: It will avoid the problem of popular item being shown, and unpopular item not being shown.
Q: Relevant items should come close to the top due to similarity calculation.
A: Popular item gets more popular. Also helps with cold-start problem.
Train/test split, model evaluation
Training: month of data. Use first 3 weeks. Cross validation.
Random sampling as easy negative sample. 100-500 from recommended list as the negative case. This can improve precision
a/b testing
Randomly assign
Partition strategy to divide social network.
Monitoring and retraining
Online, offline, system metrics. How often to retrain model depends on requirement and capacities.
Q: How do deal with cold start? You don’t have the label.
A: multi arm-bandid. Give user some item to start with; we can reward the item that’s more used.
Or train a model using the features that are shared between mature items and cold start items.
Distillation may help too, but we need to verify through measurement.
Q: what can be the bottleneck of the system in real world
A:
1 latency, return as quickly as possible the content. To achieve low latency, we use two tower model. We can calculate the embedding of all the items and save it offline. As user comes online, we can use the embedding. Tradeoff storage and time
2 order of magnitude of items. Billions of contents. Millions of users on users create content. How do we store the content? General idea is to save based on partition. Run our model on different partitions, and blend them together and rank the final results.
Non functional requirements
System Design
Key algorithms
Collaborative filtering
Semantic based filtering
How to measure the relevant:
Likes, comments, shares, reposts
We know about the meta data
2 tower model to see which items among all candidates are most relevant to recommend
External APIs
System design
Interviewer and Audience Feedback
Interviewer:
Clear outline
Asked for feedback. It’s an open question, lots of solutions. As long as you present multiple options and discuss tradeoffs, it will be fine.
Solution if it’s applicable? It’s fine to ask if I’m ok, if I should continue.
Improvement:
Some solutions are overly complex
Usually ML questions are phrased vaguely. Some ML questions need clarification
I started with vague, high level solution
Interviewee dived too deep into complex solutions.
Interviewer wants to hear a high level system design.
Need more research/homework on the product. There are some often used products.
Facebook newsfeed, what are their specialty?
Sometimes it’s not as complex.
Interviewee:
I asked interviewer to mock interview me.
I need more practice.
===
Soft skill
Interviewer
Try to understand a simple scenario, before listing 1-7 outline
Try to discuss what is newfeed. Which steps can leverage machine learning
Newsfeed, search: sometimes there are 2 steps.
How to gather relevant content - may not need ML
Ranking - needs ML
Hundreds of millions of user
Billions of user
So ML probably cannot be used on step 1.
We may just start with friends, and friends’ friends.
Hot topic: lots of text. Can pre-cluster text.
We can narrow down the candidate very quickly.
You don’t need to embed user, content or cross.
Should discuss first which steps requires ML.
Some interviewers want to test you on reading papers
Some interviews want to test you on practical knowledge.
Audience:
Different specialties have very different knowledge.
If the interviewee may or may not have worked on relevance.
If interviewer background is NLP (but not relevance), then today performance is good.
But if the interviewee has specialty in relevance, then today interviewee has not picked the most relevant content.
Today is about ML system design. Usually it’s not very deep.
Audience
Don’t pile too many models.
Audience
Cold-start problem
Social or shopping.
Cold start is not just at the level of algorithm
Online serving - new item, new news
How do we do retrieval? How do we combine cold start versus mature.
Cold start is a very big problem.
Interviewer
There are no perfect solution
Look for something that makes sense
“Let me think about it for a minute”
Today: some problems are proposed. Users: no history - then based on popularity
Item: multi-banded experiment.
Solution 1: wait till you have enough data
Solution 2: label the data manually. Topic centric.
How to scale up. Depends on the background of interviewer.
Audience:
New item. Related to levels of friends. Can we use this?
Interviewer: yes
Search and ranking
Each product has different solution
There is no template
I probably will draw some block diagram:
Gather data, generate feature, generate candidate, ranking. Each step we may discuss if we need ML, what model we need.
Some model are too sparse.
Research scientist - don’t worry about online/offline
No standard solution, number points, block diagram. As long as there is a good outline.
Audience:
ML system - usually solving part of a larger problem
45 minutes - how do we connect this system with larger goals (e.g. active users, click through rate)
If I am an interviewer:
How to develop model offline
Then how it goes online
In the end 5 more minutes.
Then what are the useful online metrics: this requires business sense.
Precision/recall may be a tradeoff. If we point this out we may get bonus.
Interviewer expectation:
Which models are commonly used in the industry?
What happens if big issues arise?
social recommendation
Data pipeline, model training, online, logging, AB testing
Then I asked about focus
Do I need to dive deeper? Or can I move on? Drive toward the aspect you are strong at.
Communicate more.
General ML design, what are the big categories? What reading material is easy/fast.
YouTube search
Recommendation system (ranking, search)
Anomaly analysis . Read infoQ - stripe, ebay
A list of 10 topics. Draw a diagram
Feature -> model -> result -> evaluation with benchmark
Most important is to set up a system.
I have not narrowed down the questions.
ID
Diversity. Was it reasonable?
Relevancy - not makes sense
Hot topic - hard to keep diversity. Related to cold start
Always see the same content may get the user bored
Cold start
Should be able to pass L4
L5: needs to be familiar with the work
Not as experienced in this part.
Over complex: shows inexperience