Design a Rate Limiter

6 min read

A rate limiter controls the rate of traffic sent by a client or service. In HTTP, it limits client requests allowed over a specified period; excess calls are blocked. Examples:

Max 2 posts per second per user
Max 10 accounts per day from same IP
Max 5 reward claims per week from same device

Benefits:

Prevent resource starvation from DoS attacks (Twitter: 300 tweets/3h; Google Docs: 300 read reqs/60s per user)
Reduce cost (fewer servers, limiting paid 3rd-party API calls)
Prevent server overload from bots/user misbehavior

Step 1 — Understand the problem and establish design scope

Requirements:

Server-side API rate limiter
Flexible throttle rules (by IP, user ID, etc.)
Handles large request volume
Works in distributed environment
Informs users when throttled

System requirements:

Accurately limit excessive requests
Low latency (no HTTP slowdown)
Use as little memory as possible
Distributed rate limiting (across multiple servers/processes)
Return clear exceptions to throttled users
High fault tolerance (cache server offline shouldn't break entire system)

Step 2 — Propose high-level design and get buy-in

Where to put the rate limiter?

Client-side: Unreliable — requests easily forged by malicious actors, no control over client implementation.

Server-side:

Middleware approach (API gateway):

Example: API allows 2 req/s. Client sends 3 requests in a second → first 2 routed to API servers, 3rd throttled with HTTP 429.

Decision factors:

Current technology stack (language, cache service)
Algorithm control (full control server-side vs limited with 3rd-party gateway)
Existing microservice architecture + API gateway → add rate limiter to gateway
Engineering resources → build vs buy commercial API gateway

Algorithms for rate limiting

Token bucket algorithm

Used by Amazon and Stripe.

How it works: Bucket with predefined capacity. Tokens added at preset rate periodically. Each request consumes one token. No tokens → request dropped.

Parameters: Bucket size, refill rate.

Buckets needed: Per API endpoint, per IP address, global bucket (10,000 req/s).

Pros: Easy to implement, memory efficient, allows burst traffic. Cons: Two parameters challenging to tune properly.

Leaking bucket algorithm

Used by Shopify. Implemented with FIFO queue.

How it works: Request arrives → check if queue full → if not, add to queue; otherwise drop. Requests pulled from queue at fixed rate.

Parameters: Bucket size (queue size), outflow rate.

Pros: Memory efficient, fixed rate suitable for stable outflow use cases. Cons: Burst traffic fills queue with old requests, starving recent ones. Two parameters hard to tune.

Fixed window counter algorithm

How it works: Divide timeline into fixed windows, each with a counter. Each request increments counter. Counter reaches threshold → new requests dropped until new window.

Critical flaw: Traffic spikes at window edges let double the allowed requests through.

Example: 5 req/min allowed, but in a rolling window from 2:00:30 to 2:01:30, 10 requests pass — twice the limit.

Pros: Memory efficient, easy to understand, quota reset fits certain use cases. Cons: Spike at window edges allows excess requests.

Sliding window log algorithm

How it works: Track request timestamps (e.g., Redis sorted sets). On new request: remove outdated timestamps (older than window start), add new timestamp, check if log size ≤ allowed count.

Example: 2 req/min allowed. Requests at 1:00:01 ✓, 1:00:30 ✓, 1:00:50 ✗ (log=3 > 2). At 1:01:40, outdated 1:00:01 and 1:00:30 removed, log=2, request accepted.

Pros: Very accurate — no request will exceed rate limit in any rolling window. Cons: High memory usage (rejected request timestamps still stored).

Sliding window counter algorithm

Hybrid of fixed window counter and sliding window log. Formula:

Requests in rolling window = current_window_count + previous_window_count × overlap_percentage

Example: Max 7 req/min. Previous window: 5 requests, current window: 3 requests. New request at 30% into current window → 3 + 5 × 0.7 = 6.5 → rounded to 6. Request accepted.

Pros: Smooths traffic spikes (average rate of previous window), memory efficient. Cons: Approximation (assumes even distribution in previous window). Cloudflare experiment: only 0.003% of requests incorrectly allowed/blocked among 400M.

High-level architecture

Store counters in in-memory cache (Redis) using INCR (increment counter) and EXPIRE (set timeout for auto-deletion). Flow:

Client → rate limiting middleware
Middleware checks counter from Redis bucket
Limit reached → reject request
Not reached → forward to API servers, increment counter in Redis

Step 3 — Design deep dive

Rate limiting rules

Configuration in files on disk (Lyft example):

domain: messaging
descriptors:
  - key: message_type
    value: marketing
    rate_limit:
      unit: day
      requests_per_unit: 5

Exceeding the rate limit

Return HTTP 429 (too many requests). May enqueue rate-limited requests for later processing (e.g., orders during system overload).

Rate limiter HTTP headers:

X-Ratelimit-Remaining: remaining allowed requests in window
X-Ratelimit-Limit: max calls per time window
X-Ratelimit-Retry-After: seconds to wait before retry

Detailed design

Rules stored on disk, workers pull into cache
Client request → rate limiter middleware → loads rules from cache → fetches counters/timestamps from Redis → decides: forward to API servers or return 429 (request dropped or queued)

Rate limiter in a distributed environment

Race condition

Counter value is 3. Two concurrent threads both read 3, both increment to 4, both write 4. Correct value should be 5.

Solutions: Lua scripts or Redis sorted sets (locks slow down system).

Synchronization issue

Multiple rate limiter servers: Client 1 → limiter 1, Client 2 → limiter 2. Without synchronization, limiter 1 has no data about client 2.

Bad solution: Sticky sessions (not scalable/flexible). Good solution: Centralized data store (Redis).

Performance optimization

Multi-data center setup: Route traffic to closest edge server (Cloudflare: 194 edge locations as of 2020) to reduce latency.
Eventual consistency model for data synchronization.

Monitoring

Verify:

Rate limiting algorithm is effective
Rate limiting rules are effective (not too strict, handle traffic surges)

Step 4 — Wrap up

Algorithms covered: Token bucket, leaking bucket, fixed window, sliding window log, sliding window counter.

Additional talking points:

Hard vs soft rate limiting (hard: never exceed; soft: exceed briefly)
Rate limiting at different OSI layers (application layer 7 HTTP shown; possible at layer 3 IP via Iptables)
Client best practices: cache to avoid frequent calls, understand limits, catch exceptions gracefully, add backoff time to retry logic

Reference materials

[1] Rate-limiting strategies and techniques: https://cloud.google.com/solutions/rate-limiting-strategies-techniques [2] Twitter rate limits: https://developer.twitter.com/en/docs/basics/rate-limits [3] Google docs usage limits: https://developers.google.com/docs/api/limits [4] IBM microservices: https://www.ibm.com/cloud/learn/microservices [5] Throttle API requests for better throughput: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html [6] Stripe rate limiters: https://stripe.com/blog/rate-limiters [7] Shopify REST Admin API rate limits: https://help.shopify.com/en/api/reference/rest-admin-api-rate-limits [8] Better Rate Limiting With Redis Sorted Sets: https://engineering.classdojo.com/blog/2015/02/06/rolling-rate-limiter/ [9] System Design — Rate limiter and Data modelling: https://medium.com/@saisandeepmopuri/system-design-rate-limiter-and-data-modelling-9304b0d18250 [10] How we built rate limiting capable of scaling to millions of domains: https://blog.cloudflare.com/counting-things-a-lot-of-different-things/ [11] Redis website: https://redis.io/ [12] Lyft rate limiting: https://github.com/lyft/ratelimit [13] Scaling your API with rate limiters: https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff574d#request-rate-limiter [14] What is edge computing: https://www.cloudflare.com/learning/serverless/glossary/what-is-edge-computing/ [15] Rate Limit Requests with Iptables: https://blog.programster.org/rate-limit-requests-with-iptables [16] OSI model: https://en.wikipedia.org/wiki/OSI_model#Layer_architecture

Step 1 — Understand the problem and establish design scope #

Step 2 — Propose high-level design and get buy-in #

Where to put the rate limiter? #

Algorithms for rate limiting #

Token bucket algorithm #

Leaking bucket algorithm #

Fixed window counter algorithm #

Sliding window log algorithm #

Sliding window counter algorithm #

High-level architecture #

Step 3 — Design deep dive #

Rate limiting rules #

Exceeding the rate limit #

Detailed design #

Rate limiter in a distributed environment #

Race condition #

Synchronization issue #

Performance optimization #

Monitoring #

Step 4 — Wrap up #