Design Google Calendar
Materials — open to everyone, no sign-in
Topic: Design Google Calendar
Interviewer: system
Level: L5 (Senior)
Additional Resources:
System Design Interview
Join Us on Wechat
Interview Notes
[7]
Requirements
500M total users
100M DAU
QPS: 1000 QPS for write
10k QPS for reading
[peak traffic?]
[11]
API:
Booking event
Request:
Sender, invitee, title, time, location, test
GET /calendar-event
[how to handle anonymous user?]
[13]
Relational database
[throughput calculation?]
Events
eventID, sender, invitee(string), title, time, location, text
[database normalization? Some information may be repeated]
[17]
eventID, sender, invitee(string), title, location, from_ts, to_ts
Calculate data volume
1.5TB per year
15TB over 10 years
[ volume is (too) big for single instance ]
[how to support multiple invitees? Will there be multiple rows]
[invitee is email or ?]
[read write throughput for one machine? Is it sufficient?]
[how to update the meeting time?]
[21]
[23]
How to support large number of invitees
Range query for meetings within an interval
Select * from meetings where time > from_ts and time < to_ts
May have 2 indices but still need to merge the result
[ in relational DB, can use composite index]
Optimize the query with slot ID
[27]
Optimizing for range query
[ spending too much time on range query optimization. Like to see other components in the design ]
[30]
Defining primary key
PK: sender + eventID
Index: dayID
Range query:
Continue on index
Index: participant + day ID
Select *
from meetings
where dayID in (day1, day2) and participant = “alf”
3k meeting per month
40k meetings per person
Need pagination
[38 - 54]
Like to clarify these questions:
[how to support multiple invitees? Will there be multiple rows]
[invitee is email or ID?]
[read write throughput for one machine? Is it sufficient?]
[how to update the meeting time?]
Invitees by email?
[database normalization. Some fields are repeated across rows]
Q: Primary key?
A: normalize the table. Factor out event detail table
[41]
Change event time requires change of multiple rows
[44]
[ system design is not complete yet]
Q: Handling users outside of the system?
A: add email notification system
Add meeting processor for changes of the events
[asynchronous]
[ discuss distributed system ]
Q: how to send email with high throughput
A: add message queue
[48]
[relational vs non-relational database: throughput and volume can be too high. For relational database, we may shard by time range. Throughput is still high ]
Q: can DB handle 1000 QPS?
A: small writes. Could be writing 3000 rows per second due to multi-invitees for some meetings
Backup longer term data
Prefer to use same database technology for old and new data
===
Discussion after Interview
Shard by time range may not be good
The QPS is high. May add cache for today’s calendar
Can create 2 databases:
User
Events - a very big table.
It’s difficult to support event table using traditional database.
Indices to maintain: user, events, time range. Difficult to maintain index on large data volume
Recommend to use non-relational database
How to solve conflict?
If another user has a meeting at the same spot.
User can receive multiple invitations at the same time
Should gather the requirements
If we need to support the meeting room.
Event - location, start-time, end-time.
Rest API can also solve this problem.
No-sql database, may require atomic support across different machines
How to shard?
Can use user ID to shard / partition key
Heavier read, lighter write
Usually use invitee ID as partition key
Possible points for drill down
Database schema
How to sync data to client
How to save to database, and how to index the database
Micro services
1:24:49:10
1:25:47:09