Like/Unlike System

System DesignRecommendation & Feed

Topic: Like/Unlike System

Interviewer: 苏鹏 Peng

Interviewee:

Level: L5 (Senior)


Topic

Mock System Design Interview Summary

Interview Overview

Date: 2/27/2022

Target level: L5

Duration: 45 minutes

Topic covered: like service

Drawing tool used: https://app.tryeraser.com/

Requirements

Functional requirements

Our website has a marketplace with millions of items for sale. For each item, users can choose to like it. An item which is liked can be unliked.

  • As a user, I want to like/unlike an item

  • As a user, I want to know if an item has been liked by me

  • As a user, I want to see how many likes an item has

Please design a system to support this like/unlike feature

Non functional requirements

Start with a small number of users

How accurate does it need to be?

Not high priority

Actual count not need to be most up to date

But should give feedback for increase and decrease (like/unlike)

System Design

External APIs

API:

like(user, itemId)

unlike(user, itemId)

getLikeTotalCount(itemId)

islikedby(user, itemId)

Db:

user, itemID

itemId, totlaLikes

100Mb scale

System design

Choice of database

PostgreSQL. Native support of change feed

Usually favor relational DB, easy to manage

Supports consistency. Relational db much easier

2DBs: They are different logical storage

What happens when user like and then unlike?

Remove the row, or mark removal with a flag.

Tradeoff:

Doesn’t impact like, unlike, islikedby as much

DB:

User, itemId, liked

PK: user, itemId

Not a huge difference (if user clicks like/not like very often)

If you delete a lot of things from database, does it affect DB?

like/unlike history, is it useful?

Database engine,

MySQL: B+ tree

(LMS tree is the other popular data organization).

Creation/deletion,

Intuitively add/remove is more heavy weight

Now let’s scale it up

like/unlike: 100,000 QPS

getLikeTotalCount: 1M QPS

isLikedBy: 1M QPS

Does API need to be strongly consistent?

If I like someone or some item, the button turns red.

Read after write consistency

Total likes: doesn’t need to be very accurate

DB size is small enough

OneDB or just shard based on item

Real browsing traffic, no hot user

We can shard by userID, can result in the right partition

4 or 5 shard if we shard by user

Userlike: shard by user

totalLikes: shard by itemID

Active 100k

Total users ~1M

20 TB

Storage is not a bottleneck

Which part is likely to be a bottleneck

If we have 1M users, 100 shards

The likeService

Want this to be resilient to avoid single point of failure

Add load balancer

Interviewer hint: can change DB

If we don’t care about read/write consistency, we can have other options

Compute hash in like service, compute which shard the request hits

Fixed number of shards to get started

Doesn’t do auto scaling

If we need to spin up new instance, it disrupts the hash algorithm

Ideally it should automatically scale up

Static sharding: maybe one shard has too much data

Walk through data flow

Frontend: load balancer. Like service is stateless

Each instance of like service, need to compute which shard to go to based on userID (hashcode)

Write to the userLike DB instance based on the hashcode

Storage itself is not high

We just need more read replica for totalLike table

Add likeCount service

UserLike databases uses master slave

Which DB can fit the need for userLike table?

If someone liked something, and open a different database

mongo DB -> within one client session it guarantees read / write consistency

We can still use relational DB due to strong consistency requirement

Number of sharding doesn’t need to change that often

Increase or decrease # of replicas for each shard

For strong consistency, isLikedBy(user, itemId)

Like,unlike 100k QPS, peak may be 3 times.

100 databases in cluster

10 databases is still too much

Partition out of box, mongoDB, provides sharding out of box

Can cache help?

The cache can still fail. If we can compromise in consistency, then cache will help

Add cache

Cache crashes, we may lose some write

Losing write

Based on QPS, cache may not be necessary

Distributed cache, local cache

Sticky like service, can cache locally

Strong consistency

Very difficult to use cache

Add partition

Interviewer and Audience Feedback

Some challenges

Mock interview, somewhat nervous

Requirement clarification is good

MVP - can work

Scalability and resiliency, not too satisfied

Bottleneck to optimize

重要考点

Read heavy - need to use a lot of cache

Using sharding and replica - cache may be better

Try to hint to use cache

Candidate x

Strong consistency

Can use local cache for strong consistency

Like service can have a cache, like and unlike, just write to local cache

Then it can send to queue, then update the db

If local cache - fail from DB

Local - session stickiness

Like - and like is successful

If the server didn’t send success

Not write to DB

Write to cache and write to a highly available queue

==

Interviewer does not use regular ways to give question

Talk too fast, can pause 5 seconds how to solve

Oncall

How to onboard new developer

First ask about the situation clearly

Don’t need to jumpt to answer

Interviewer split into smaller scale question

Then a large scale question

First high level design

The biggest is scalability

==

Is there sharding in your

Like service - should scale up and down based on

Stateful vs stateless

100k QPS, a big write

Batch write - backend service

Change data capture from userLikeDatabase, this one is good

On top of totalLike, we can add a cache

Like, unlike, only for myself

keyValue pair

High QPS - 100k write cannot be handled

NoSQL, cassandra or dynamoDB

Why is mysql difficult?

A single mysql

Manual sharding, 10 different database, static sharding

Split the database

Backup, snapshot, upgrade, maintenance

10->20 need to manual split

Cache, message queue sharding

Metas tool?

Write speed -> 100k hard to reach for traditional database

Load balance, can put in a lot of like service

We can handle high QPS

Local cache

When you write, write to cache

When you return, you must have successfully write to both cache and queue

User session sticky to like service

What happens to like service?

Change DB to key-value db

Total like needs to go to cache