An Intimidating List of Database Concepts
- Relational algebra (primitives for processing queries) vs relational calculus
- Selection clause (
WHERE
)
- Projection clause (
SELECT
)
- Unions (
UNION ALL
)
- Intersections (
INTERSECT
)
- Difference (
EXCEPT
)
- Product (
CROSS JOIN
)
- CRUD: basic operations
- Create (
INSERT INTO ... VALUES
)
- Read (
SELECT ... FROM
)
- Update (
UPDATE ... SET
)
- Delete (
DELETE FROM
)
- Window function: aggregate operation on a line by line basis
RANK()
POSITION()
OVER (PARTITION... ORDER BY...)
- Joins
- Inner joins
- Outer joins
- Lateral joins: connect tables that are used in subqueries (see correlated subqueries)
- Cross joins: Cartesian product
- Left/right joins
- Full joins
- Natural joins
- “Anti-joins” and “semi-joins”
- Physical joins (lookup join, hash join, merge join, zigzag join)
- WCOJ (Worst-case optimal join)
- CTEs (
WITH
clauses)
- Recursive CTEs: Useful for collecting graphs of data
- Indexes
- Foreign keys
- Forward indexes
- Inverted indexes: take a container
- Covering index/“stored column”: Improves third-column lookups by an indexed column that’s not the primary key
- Partial indexes: created with predicate
- Pagination
- LIMIT and OFFSET
- Keyset Pagination
- Cursors
- Array aggregates (
array_agg()
): collect values into arrays by a common sibling, can be ordered inside of the aggregation
- Computed columns: Can be
STORED
and VIRTUAL
- Data Modelling
- Normalization and normal forms
- Denormalization (for performance)
- Materialized views
- Connection pools
- ORMS: Are they good or bad? Do they create bad queries?
DUAL
table: Not supported in PostgreSQL (Oracle and some other DBs)
- Stored procedures
- Plan hints:
FROM a@a_pkey
- MVCC garbage collection/vacuuming: Multi-version concurrency control, every piece of data has a new version, occasionally needs cleanup
COUNT(*)
, COUNT(1)
, COUNT(col)
- ACID
- Transactions
- Transaction contention, locks, deadlocks
SELECT FOR UPDATE
vs SKIP LOCKED
- Single- and multi-phase commits
- Isolation
- Sharding, data partitioning
- Triggers
- Foreign key cascade
- Custom triggers
- MERGE
- INSERT ON CONFLICT
EXPLAIN(opt, memo, verbose, distsql, vec)
- Grouping sets, cubes, rollups: slice and aggregate data across different dimensions
HAVING
(link)
- “Event sourcing”
- Ascending key problem: causes hotspots, resolvable by using a UUID primary key or a hash-sharded index to split hotspots across shards
- Sargability: “search arg”
- Star schema
- Character sets and encoding
- JSON(B) types
- Deferrability: constraint triggers act after statement or after transaction
MATCH FULL
, MATCH SIMPLE
, MATCH PARTIAL
- TPCC benchmarking
- Causal reverse: anomaly possible when ordered events lose proper order in database
- Vectorization: DB engine operating on a column-by-column basis
- Stats errors
- Volcano model
- Automatic optimization
- Learned indexes
- Database cracking
- XTID exhaustion: on Postgres if you don’t vacuum
- “The Halloween Problem”: updating a table causes rows to appear further down the scan, causing unpredictable repeat updates
fsync
, “fsyncgate” (talk)
- Sequences
- Coalescing
- Unions and intersections
Resources