An Intimidating List of Database Concepts

#databases

Relational algebra (primitives for processing queries) vs relational calculus
- Selection clause (WHERE)
- Projection clause (SELECT)
- Unions (UNION ALL)
- Intersections (INTERSECT)
- Difference (EXCEPT)
- Product (CROSS JOIN)
CRUD: basic operations
- Create (INSERT INTO ... VALUES)
- Read (SELECT ... FROM)
- Update (UPDATE ... SET)
- Delete (DELETE FROM)
Window function: aggregate operation on a line by line basis
- RANK()
- POSITION()
- OVER (PARTITION... ORDER BY...)
Joins
- Inner joins
- Outer joins
- Lateral joins: connect tables that are used in subqueries (see correlated subqueries)
- Cross joins: Cartesian product
- Left/right joins
- Full joins
- Natural joins
- “Anti-joins” and “semi-joins”
- Physical joins (lookup join, hash join, merge join, zigzag join)
WCOJ (Worst-case optimal join)
CTEs (WITH clauses)
- Recursive CTEs: Useful for collecting graphs of data
Indexes
- Foreign keys
- Forward indexes
- Inverted indexes: take a container
- Covering index/“stored column”: Improves third-column lookups by an indexed column that’s not the primary key
- Partial indexes: created with predicate
Pagination
- LIMIT and OFFSET
- Keyset Pagination
- Cursors
Array aggregates (array_agg()): collect values into arrays by a common sibling, can be ordered inside of the aggregation
Computed columns: Can be STORED and VIRTUAL
Data Modelling
- Normalization and normal forms
- Denormalization (for performance)
Materialized views
Connection pools
ORMS: Are they good or bad? Do they create bad queries?
DUAL table: Not supported in PostgreSQL (Oracle and some other DBs)
Stored procedures
Plan hints: FROM a@a_pkey
MVCC garbage collection/vacuuming: Multi-version concurrency control, every piece of data has a new version, occasionally needs cleanup
COUNT(*), COUNT(1), COUNT(col)
ACID
Transactions
- Transaction contention, locks, deadlocks
- SELECT FOR UPDATE vs SKIP LOCKED
- Single- and multi-phase commits
Isolation
- Isolation levels
  - Read uncommitted
  - Read committed
  - Repeatable
  - Snapshot
  - Serializable
  - Linearizable (see consistency models)
- Phantom read
- Write skew
Sharding, data partitioning
Triggers
- Foreign key cascade
- Custom triggers
MERGE
INSERT ON CONFLICT
EXPLAIN(opt, memo, verbose, distsql, vec)
Grouping sets, cubes, rollups: slice and aggregate data across different dimensions
HAVING (link)
“Event sourcing”
Ascending key problem: causes hotspots, resolvable by using a UUID primary key or a hash-sharded index to split hotspots across shards
Sargability: “search arg”
Star schema
Character sets and encoding
JSON(B) types
Deferrability: constraint triggers act after statement or after transaction
MATCH FULL, MATCH SIMPLE, MATCH PARTIAL
TPCC benchmarking
Causal reverse: anomaly possible when ordered events lose proper order in database
Vectorization: DB engine operating on a column-by-column basis
Stats errors
Volcano model
Automatic optimization
- Learned indexes
- Database cracking
XTID exhaustion: on Postgres if you don’t vacuum
“The Halloween Problem”: updating a table causes rows to appear further down the scan, causing unpredictable repeat updates
fsync, “fsyncgate” (talk)
Sequences
Coalescing
Unions and intersections

Resources

Database Iceberg (Image) by Jordan Lewis
Hermitage
c2 wiki
Consistency Models