Design a Unique ID Generator in Distributed Systems

3 min read

Traditional auto_increment doesn't work in distributed environments — a single DB server can't handle the scale, and generating unique IDs across multiple DBs with minimal delay is challenging.

Step 1 — Understand the problem and establish design scope

Requirements:

IDs must be unique
IDs must be sortable (increment by time, not necessarily by 1)
IDs are numerical values only
IDs fit into 64-bit
IDs ordered by date (evening IDs > morning IDs from same day)
Generate over 10,000 unique IDs per second

Step 2 — Propose high-level design and get buy-in

Multi-master replication

Uses database auto_increment but increments by k (number of DB servers) instead of 1. Server 1: 1, 3, 5, ...; Server 2: 2, 4, 6, ...

Pros: Some scalability. Cons: Hard to scale with multiple data centers. IDs don't increase with time across servers. Doesn't scale well when servers added/removed.

UUID

128-bit number, collision probability extremely low (1 billion UUIDs/sec for ~100 years → 50% chance of single duplicate). Example: 09c93e62-50b4-468d-bf8a-c07e1040bfb2

Each web server generates IDs independently — no coordination needed.

Pros: Simple, no synchronization issues, easy to scale. Cons: 128 bits (requirement is 64 bits). IDs don't go up with time. IDs could be non-numeric.

Ticket Server

Centralized auto_increment in a single DB server (Flicker approach).

Pros: Numeric IDs, easy to implement, works for small/medium-scale. Cons: Single point of failure. Multiple ticket servers introduce data synchronization challenges.

Twitter snowflake approach

Divide a 64-bit ID into sections:

Section	Bits	Details
Sign bit	1	Always 0 (reserved for future)
Timestamp	41	Milliseconds since custom epoch (Nov 04, 2010, 01:42:54 UTC)
Datacenter ID	5	2^5 = 32 datacenters
Machine ID	5	2^5 = 32 machines per datacenter
Sequence number	12	Incremented per ID on same machine, resets to 0 each millisecond

Step 3 — Design deep dive

Timestamp: 41 bits → max timestamp = 2^41 − 1 = 2,199,023,255,551 ms ≈ 69 years from custom epoch. After 69 years, new epoch time or migration needed.

Sequence number: 12 bits = 2^12 = 4,096 combinations. Max 4,096 new IDs per millisecond per machine. Field is 0 unless >1 ID generated in same millisecond on same server.

Datacenter IDs and machine IDs: chosen at startup, fixed while running. Changes require careful review to avoid ID conflicts.

Step 4 — Wrap up

Approaches evaluated: Multi-master replication, UUID, ticket server, Twitter snowflake. Snowflake chosen — supports all requirements and is scalable in distributed environments.

Additional talking points:

Clock synchronization: ID generators on multiple cores/machines may have different clocks. Network Time Protocol (NTP) is the most popular solution.
Section length tuning: Fewer sequence bits + more timestamp bits for low-concurrency, long-term applications.
High availability: ID generator is mission-critical — must be highly available.

Reference materials

[1] Universally unique identifier: https://en.wikipedia.org/wiki/Universally_unique_identifier [2] Ticket Servers: Distributed Unique Primary Keys on the Cheap: https://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/ [3] Announcing Snowflake: https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake.html [4] Network time protocol: https://en.wikipedia.org/wiki/Network_Time_Protocol

Step 1 — Understand the problem and establish design scope #

Step 2 — Propose high-level design and get buy-in #

Multi-master replication #

UUID #

Ticket Server #

Twitter snowflake approach #

Step 3 — Design deep dive #

Step 4 — Wrap up #