Rate Limiter

System DesignReliability & Scaling

Materials — open to everyone, no sign-in

Topic: Rate Limiter

Interviewer: Pinglu

Interviewee: 09817167d

Level: L4 (Experienced Individual Contributor)


Rate limiter

Mock System Design Interview Summary

Interview Overview

Date:

Target level:

Duration: 1 hour

Topic covered: Rate limiter

Drawing tool used

Requirements

(side note: candidate is Ph.D. new grad, but already joined system design interviews with real companies)

Functional requirements

Limit the rate of requests

By request type or by user

Frequency per minute depends on users

Services are deployed on multiple servers

Return boolean if false, response of http status of 400

Limit based on per minute, per hour

Non functional requirements

System Design

External APIs

For each request

Input parameter: user, request type

Limit based on per minute, per hour

Input parameters: int api_id, int userId

Output:

boolean

true: process request, give a response

False: return HTTP status

Database schema

Data model:

APITable

Key: api_id

Value: rate_limit

UserTable

Key: userId

Values: rate_limit, email, phone

Low latency:

cache rate count of each api in time intervals, cache user info; cache API info

Multiple clusters around the world; clusters around

System design diagram

Type of storage: cache, redis, memcached

For each API, create a hashTable

hashTable, key is userID, value is count of frequency in current time interval

Type of storage: hashTable, key

API_ID 2Bytes + userID 8Bytes + overhead of hashTable 20B = 30B per count

Assuming 1000 requests per time interval

30KB per user

1 million users, 30GB storage

IP address and user?

IP cannot distinguish good/bad users

Effectively limit the total rate

Users: pros: distinguish good/bad users

Cons: short period of time of surges

Cons: login

Algorithm:

For API working on single server, sliding window will be a simple and effective algorithm

API on multiple server: token method.

For each server we can allocate a few tokens

If a request comes, the server reduces from the token locally

Servers communicate with each other

Discussions during the Interview

Interviewer:

customer sends request

Customer goes to the rate limiter, then go to the actual server

Interviewee:

Move the rate limit to the LB

For server with high computing capability, we can allocate more tokens -> more job allocated

Eventual consistency

It is possible to compute more jobs in a time interval than limit -> allow negative count of token

https://docs.google.com/document/d/182rZ1om9Tj1RNCNPRRm5DSfWmXJn1Gm9Y4gLk6vjQIM/edit

Interviewer and Audience Feedback after the Interview

Interviewer feedback

Did well:

Gather requirement well

Design API ahead of time

Right cache design, key and value

Pros and cons of IP and users

Explain clearly

To improve

Cluster vs single machine. Does not affect the rate limiter. Rate limiter happens before requests hit the server

Explain the algorithm more clearly. Sliding and fixed window. Pros and cons.

Rate limiter: does not fit into template. Type of storage. It took a lot of time. We don’t need to persist data in the database. We can remove

Limiter:

Per server, per API

It can be separate from

QPS is very high for limiter

Therefore we don’t need

Audience:

Directly decorate on the API server

Stickiness on the load balancer

Audience:

Need to switch algorithm

Audience

Is there a traffic spike?

If there is no traffic spike. There may need to be a queue

If a spike goes to 1000 QPS per second from usual 100 QPS

Can support lots of request

Should be able to handle a lot of requests

Can queue up and still receive the response

Audience

If capacity is enough, can add burst handling

Audience

Rate limit should reject?

Overall rate limit at load balancer

Server can have its own limit

Audience

Rate limit: 多个customer

There is some flexibility to go over the limit

burst , bucket

Audience

Why per user rate limit

Protection of the server: may not need to limit by user

Audience

Some customers may automate the requests using script

Interviewer

Security may be a consideration

Audience

Some web servers allow anonymous

User story

Based on work experience, Restful api, where can we get the user ID?

Distributed system: may be for a backend server

Anonymous token?

Use IP

Load balancer: IP hash. Fixed ip->fixed server

Backend:

Load balancer may put IP into X-IP forward

Audience

Client side rate limiting

E.g. chrome browser. Javascript may limit it from the client side

Client side can help. Server side is required

Audience

Can we use DB?

DB, MemCacheD

All servers can read from the same DB

Pure memory: it’s possible that one user’s requests are sent to multiple

Sticky server

Or use MemCacheD

Based on diskDB is not feasible.

If use DB, must use memory based DB

Can also use memory

10 LBs -> each LB just use 10% of the rate limit

Same domain name can map to multiple IPs, then it may randomly pick one of the IPs

If it’s for different services

Load balancer needs all rules for rate limiting

Too much business logic in the LB

Rate limiter can be put on each

IP hash may not be balanced.

Using IP based

What’s the topology?

LB -> Rate limiter

Rate limiter can be put on LB, can also be put on the server behind LB

Traffic usually reaches LB first, then rate limiter

Token server

Are we testing the rate limiter algorithm as well?

Per user, per API

Same user may get different quota

API is heavy or light

Can load different algorithm based on

Rate limiter: VIP, IP address, can get a few more token

VIP: queue1. Regular: queue2

Preempted: VIP not preempted. IP preempted

Can sliding window handle distributed

Distributed: should use token

Single machine: can use sliding window

Regardless whether its token or sliding window, whether it’s token, it has pros and cons

Token vs sliding window

Distributed vs single machine

Load balancer put at LB vs backend

Let’s say 3 regions

500QPS for asian

200QPS for other regions

Reason for multiple regions

DNS can handle first level of LB, split to beijing vs shanghai

Second level is within one region

We can have 2 layers of rate limiter

First one is at LB

Second one is at service level

L4 vs L7 rate limiter

Key

UserID:API

Value

Number of time hit

Expiration

1 second

Easy to maintain

Not as accurate as sliding window

We can add buffer to make the system behavior more smooth

3 choices

Rate limiter at LB

Rate limiter at API server

Rate limiter can be separate from API

DNS load

Rate limit can be for IP

Rate limit can be for user

IP