Ticketmaster · commitway

Topic: Ticketmaster

Interviewer: xiang

Interviewee: becky

Level: L5 (Senior)

Topic

Mock System Design Interview Summary

Interview Overview

Date: 11/14/2021

Target level: L4 or low L5

Duration: 1 hour

Topic covered:

Drawing tool used: https://whimsical.com/LMjh729zgs43BaxYWoKM4T

Requirements

Functional requirements

Ticketmaster

A large ticket sales event will start at a pre-set time

All tickets are the same (no difference in seating or content)

Non functional requirements

100k tickets to sell

10M people want to buy it

P0: buy ticket, cap = 2 tickets / user, all tickets the same

P0: buyer can see order status

P0: 1 hour session timeout for order

P3: view tickets

audience comment in zoom:

Non-functional requirements missing

System Design

API

POST: /v1/tickets?q=number_items

Resp {status: “”, callback: url}

(fail, pending)

POST: /v1/tickets/payment

Resp: {status: }

{Accepted, rejected, timeout}

Interviewer:

view tickets

Interviewee:

Interviewer:

P0: get order status

Interviewee:

GET: /v1/orders

GET: /v1/orders/orderID

Database schema:

Ticket:

Uid: pk

Item: varchar

Total_quantity: int

Available_quantity: int

Price: int

Order:

Uid

User_id

Payment_ref

Status

Creation_ts (can be used to cancel)

Quantity

System design diagram

Discussion

Use mySQL, postGre

Reason: relational, strong consistency guarantee needed

Discussion:

Interviewer:

10M users will visit the website

Will this cause problem

Interviewee:

Not many rows in the ticket DB

Only one row in the ticket DB to indicate

Ticket table will be very small, so DB can handle it

However, if there are many types of tickets to sell, then we can think of additional layer of cache between service and DB, e.g. redis, or other in memory DB

For placing order, we need to go to MySQL

User visit website
API gateway sends request to ticket service

3.1 Ticket service check with redis the available ticket count

3.2 Ticket service places order with order service

3.3 order service updates ticket count with DB

4 order service responds the success/failure

API gateway sends request to

Discussion

Interviewer:

500k user wants to buy ticket within 1 second

Interviewee

Multiple replica of API gateway, each one handles 50k

Interviewer:

What is in the cache

Interviewee:

Ticket remaining. Updated by batch job

Interviewer:

How often do you update the cache?

Interviewee:

10 to 30 seconds

Interviewer:

In the cache the system may show available but not available

Interviewee:

Reason for the batch job

After 1 hour, some order expire and we should update the ticket database to replenish the available count

Interviewer:

You update at 10 seconds frequency. The first second the

Interviewer:

Who updates the available quantity

Interviewee:

When user place the order, quantity is updated

Interviewee:

Session management service is needed

All tables are in mysqlDB

Redis table

ticketID: avialable_q

Probably don’t need cache service

Interviewer:

Why copy the data to redis? Why not just mySQL

Interviewee:

After the event is sold out, we don’t need to keep the table in redis

Interviewer:

Want to maintain the order of the requests

Want to make sure the request is handled based on incoming requests

Interviewee:

When can do something very close First-in/first-out

Due to uncertainty in network delay

For strict first-in-first-out, need a single thread to process the request

We can put some queuing service after the database

Discussions during the Interview

Requirements:

Are tickets the same

Interviewer and Audience Feedback after the Interview

Soft skill

312

321

Hard skill

Interviewer:

Non-functional requirement, which part is the most difficult

CAP - which one we should emphasize the most

Every checkpoint 3-5 minutes, ask for feedback,

Hard skill:

I am a bit confused about the design

When going into detail, what are the high level design?

E.g. I want to put all data into mysql. It may not be ideal to directly hit mysql.

Redis is correct. Cache - not good

10 seconds to update the cache, may not be good since too much changes for

Audience

Redis: If we write from redis to hard disk, will slow down performance

Audience

Redis is distributed. Don’t put count

Every ticket is an entry in redis

Audience

Key is sequential.

100 ticket - 100 entry

How does the service know which ticket is where

Ticket table:

Every type of ticket

Interviewer: all tickets are the same

Audience:

Every ticket is one entry

Interviewee:

You need to scan 5M entries

Some tickets are reserved, available or bought

From order table

Audience:

Do we need ticket ID or are they always the same?

Interviewee:

Initial design every ticket is different

Interviewer:

How to handle huge request?

How to ensure the quantity is accurate?

50k - how to distribute to 5M people?

Handle 50k requests

Audience:

You may want to throttle at the API

We may just limit the count.

There may be abuse, using the ticket grabbing system

Not everyone may buy?

If they buy and no abuse, then we can thottle at API

Interviewer:

Accurate number?

$ is saved in sql database

However the throughput is too high

Need to find some method to handle high throughput

Audience

When reading, you can read dirty data

However, when you place order, you need to be more sequential

Interviewer:

3-5M request for 10 seconds

Place order need to succeed

Interviewee:

Redis:Update quantity

Order service, insert order row

Audience

3.2: put request in the queue

Audience

What happens redis crashes

Audience

MySQL usually crashes before Redis

Can use cluster

Interviewee

Source of truth is at mysql

Sum of status paid

Redis is just a cache

Audience

Ticket and order are different systems

You need 2PC between redis and db

Interviewee:

Adding lock for one row, same for multiple rows

Audience

First read redis

Updates db, then immediately update redis

Audience

Data entry is small

Can we cache count in API gateway?

Interviewee:

Initial design uses ticket as cache

Audience

Redis: atomic write

Redis: server cluster

For distributed for

There may be bursts of writes

Concurrency issue for burst

Reading: you may read dirty

However, when you write you can

Audience

3.1 for reservation

3.2 is for placing order

Audience:

Ticket service may crash after update of redis

Then MQ may not have the request to buy ticket

Ticket service crashes, we

Interviewee

Need 2PC

Audience

Should update DB first before updating cache

So we should not update cache first

Audience

Whoever pays first will get the ticket

After buying the ticket, you need to compete to pay, bad experience

Payment must succeed, if reservation is

Need to reserve for one hour

How do we reserve for one hour

Reduce the available

Interviewee:

Session management

Scan, and unlock after 1 hour

Audience:

What happens if we update mysql, and then crash before updating redis

The redis will not have the most up to date

Protect mysql with queue: 3.2 and the unlabeled arrow into order service

Key point is to have a cache for high throughput read

Redis is a buffer

Interviewee:

Direct buy button, small scope in the flow

===

Audience

Life cycle

Where do we get the tickets?

Seller

Monitoring, 报表, lifecycle

1 billion requests: we can throttle at API gateway

1 million within one minute

Separate the hot tickets to a different set of servers

You may have a separate SQL database table

How do we increase

Audience

It’s very specific

After you finish the design

At estimate stage

Sometimes interview starts to small scale

It may be hard to extend to scale up

Try to design MVP. If there is time, extend

Interviewer