Key Concepts Glossary
5 min readA
ACID - Atomicity, Consistency, Isolation, Durability. Properties guaranteeing database transaction correctness.
Anti-entropy - Process of comparing replicas and synchronizing data to resolve inconsistencies.
Atomicity - Property that all operations in a transaction succeed or none do.
Availability - System's ability to respond to requests, even when some nodes fail.
B
B-tree - Balanced tree data structure for indexing, maintaining sorted key-value pairs. Standard in most relational databases.
Backpressure - Mechanism where downstream system signals upstream to slow down.
BASE - Basically Available, Soft state, Eventual consistency. Contrasts with ACID.
Bloom filter - Probabilistic data structure to test set membership, with false positives but no false negatives.
Broadcast hash join - Join strategy where small table is broadcast to all nodes.
Byzantine fault - Node providing incorrect or malicious data.
C
CAP theorem - Consistency, Availability, Partition tolerance - can only guarantee two in presence of network partitions.
Causal consistency - Ordering guarantee where causally related operations appear in order.
Change Data Capture (CDC) - Technique to capture and stream database changes.
Clustered index - Index where row data is stored directly in the index.
Column-oriented storage - Storage format where each column is stored separately, optimized for analytics.
Consensus - Process of getting multiple nodes to agree on a value.
Consistent hashing - Hash function that minimizes redistribution when nodes added/removed.
Consistent prefix reads - Guarantee that reads see causally consistent prefix of writes.
CRDT - Conflict-free Replicated Data Type. Data structure that automatically resolves conflicts.
D
Data-intensive - Application where challenges are data volume, complexity, and change rate (not CPU).
Distributed transaction - Transaction spanning multiple nodes, requiring coordination.
Durability - Property that committed data survives system failures.
E
Event sourcing - Storing all events that led to current state, not just current state.
Eventual consistency - System will become consistent if no new updates are made.
F
Fault - One component deviating from spec (vs failure: system as a whole stops working).
Fault tolerance - System's ability to continue operating despite faults.
Fencing token - Monotonically increasing token to reject stale operations.
G
Gossip protocol - Protocol for nodes to share information about cluster state.
H
Happens-before - Relationship defining causal ordering of events.
Hash index - Index using hash table for O(1) key lookups.
Head-of-line blocking - When slow requests hold up subsequent fast requests.
I
Immutability - Property that data cannot be changed after creation.
Isolation - Property that concurrent transactions don't interfere with each other.
J
Join - Combining rows from two or more tables based on related columns.
L
Latency - Time a request waits to be handled (excluding service time).
Linearizability - Strongest consistency model where operations appear atomic and instant.
Log-structured storage - Storage approach using append-only logs and compaction.
LSM-tree - Log-Structured Merge-tree. Storage engine using sorted runs and background compaction.
M
Materialized view - Pre-computed query result stored as a table.
Memtable - In-memory sorted data structure in LSM-trees.
Multi-Version Concurrency Control (MVCC) - Concurrency control using multiple versions of data.
N
Normalization - Database design to reduce data duplication through relationships.
O
OLAP - Online Analytical Processing. Read-heavy analytics workloads.
OLTP - Online Transaction Processing. Write-heavy transaction workloads.
Optimistic concurrency control - Allow conflicts, detect at commit time, retry.
P
Partition tolerance - System's ability to continue operating despite network partitions.
Percentile - Statistical measure (p50, p95, p99) describing distribution of values.
Pessimistic concurrency control - Lock before accessing to prevent conflicts.
Primary key - Unique identifier for a row/document/vertex.
Q
Quorum - Minimum number of nodes that must respond for operation to succeed.
R
Read skew - Reading different objects at different times, getting inconsistent view.
Replication - Copying data across multiple nodes for redundancy.
Replication lag - Delay between leader write and follower replication.
S
Schema-on-read - Data structure interpreted when read (vs schema-on-write enforced at write).
Secondary index - Index on columns other than primary key.
Snapshot isolation - Transaction sees consistent snapshot of data at start time.
SSTable - Sorted String Table. Immutable, sorted file format for LSM-trees.
Star schema - Schema with central fact table and dimension tables.
Synchronous replication - Leader waits for follower acknowledgment before confirming write.
T
Tail latency - High percentiles of response time (p95, p99, p99.9).
Three-phase commit (3PC) - Attempt to improve 2PC with reduced blocking.
Total order broadcast - Guarantee that all nodes receive messages in same order.
Transaction - Group of reads and writes as single logical operation.
Two-phase commit (2PC) - Coordinator-based atomic commit protocol.
V
Vector clock - Data structure for capturing causality in distributed systems.
Vertical scaling - Using more powerful machine (vs horizontal scaling).
W
Write-ahead log (WAL) - Append-only file for crash recovery and replication.
Write amplification - Multiple disk writes for single logical write.
Write skew - Two transactions read same data, write different objects, violating constraints.