SQL Database
Topic: SQL Database
Presenter: Pauline
System Design Presentation
SQL database
We are looking for presenters !
QRCode:
Diagrams:
https://app.excalidraw.com/l/7yDNW4C8AjE/9OCFEE4WJdL
SQL
MySQL
ISAM, vs InnoDB
Index: ordered storage of data, or pointers to data
Sequential search, binary search, B+ tree search
Binary search: Tree height determines the IO speed
B+ tree:
Internal node
MyISAM is a non-clustered index.
InnoDB is a clustered index. It must have primary key
MyISAM: primary and secondary key is similar. B+ tree leaf nodes point to the data address. A
In InnoDB: secondary key B+ tree points to the primary key. Then it needs to go to B+ tree of primary key to get to data.
Before 2009 default to MyISAM
After 2009 default InnoDB
Composite index: There is order in columns. Saves space.
Multiple indexes: there is an index for every column. Bigger memory consumption
Unique index
Unique constraint
Best practice
Don’t use long string as primary key
Choose column to build index:
Make sure the column can help eliminate choices quickly (e.g. don’t use female/male as index)
Transaction model
An example of why crash may cause problems in the database records
Transaction: multiple changes are done together.
Atomic, consistent, isolation, durability
Consistency is the most important.
Biggest problem is concurrent access
Log recovery - keep atomic and durability
Detect wrong transactions and rolls it back
Concurrency issues
Read only no problem
Write - read: may read incorrectly
Write - write: may lose data updated
Different isolation levels:
Read uncommitted
Read committed
Repeatable read
Serializable
Read uncommitted’s problems: Dirty write, dirty read.
Read commit’s problems: there may be unrepeatable commit
Read repeatable: Phantom reads
Transaction isolation and isolation levels.
MySQL - repeatable read
Oracle - read committed
Optimistic concurrency control
Read->compute->validate->Write
Pessimistic concurrency control
Validate -> read -> compute -> write
Concurrency-control protocols - lock based protocol
Concurrrency-control protocols - lock based
Shared lock (S)
Exclusive lock (X)
Deadlock, Hungry
Concurrency-control protocols - timestamp based
W_TS(X) write timestamp
R_TS(X) read timestamp
Rollback conditions:
Write item (X): R_TS(X) > TS(T) or W_TS(X) > TS(T)
Read item (X): W_TS(X) > TS(T).
Time-based control is pessimistic
Write immediately but does not commit
Only validate when committing
===
Write based log (WAL)
Add, delete, update records in WAL
Intermediate update and delay update
Undo WAL: record old value, when system rollback, it takes original state
Redo WAL: record new value, when system crashes, it will redo to get new value