ApiRateLimitingAlgorithms

Rate limiting protects APIs from resource exhaustion and abuse by enforcing a ceiling on the number of requests a caller can make within a specified temporal window.

Core Algorithms

1. Token Bucket

The bucket has a fixed capacity $C$. Tokens are added at a constant rate$r$per second. Each request consumes one token. If the bucket is empty, the request is rejected.

- **Math:**$\text{tokens} = \min(C, \text{tokens}_{old} + (t_{now} - t_{last}) \times r)$- **Benefit:** Allows for bursts (up to capacity$C$) while maintaining an average rate$r$. The standard for high-performance APIs (AWS, Stripe).

2. Leaky Bucket

Requests enter a FIFO queue and are processed at a constant rate. If the queue is full, new requests are dropped.

- **Benefit:** Smooths traffic bursts into a steady stream (Traffic Shaping).

3. Fixed Window Counter

Counts requests in discrete time intervals (e.g., 1-minute blocks).

- **Flaw:** The "Boundary Spike." A user can send their full quota at the end of window$N$and again at the start of window$N+1$, doubling the allowed rate in a short burst.

4. Sliding Window Counter

A hybrid approach that weights the count of the previous window by the overlap fraction.

- **Math:**$\text{count} = \text{current\_window\_count} + \text{previous\_window\_count} \times (1 - \text{fraction\_of\_current\_window\_elapsed})$- **Benefit:** Eliminates boundary spikes with minimal memory overhead ($O(1)$).

Atomic Implementation (Redis + Lua)

Rate limiting must be atomic to prevent race conditions in distributed systems. Using a Lua script ensures the "Read-Modify-Write" cycle is indivisible.

**Token Bucket Lua Snippet:**

```lua

local key = KEYS[1]

local rate = tonumber(ARGV[1])

local capacity = tonumber(ARGV[2])

local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')

local tokens = tonumber(bucket[1]) or capacity

local last_refill = tonumber(bucket[2]) or now

local delta = math.max(0, now - last_refill)

tokens = math.min(capacity, tokens + (delta * rate))

if tokens < 1 then

return 0

else

tokens = tokens - 1

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)

return 1

end

```

Comparison Table

| Algorithm | Precision | Bursts Allowed | Memory | Use Case |

|---|---|---|---|---|

| **Fixed Window** | Low | No | Low | Soft quotas |

| **Sliding Log** | High | Yes | High | Strict accuracy |

| **Sliding Counter**| Medium | Yes | Low | Default high-volume |

| **Token Bucket** | High | Yes | Low | Tiered API access |

Response Headers (RFC 6585)

Always return standard headers to allow clients to self-throttle:

```http

HTTP/1.1 429 Too Many Requests

Retry-After: 30

X-RateLimit-Limit: 1000

X-RateLimit-Remaining: 0

X-RateLimit-Reset: 1735340000

```

Operational Strategies

- **Client Identification:** Use `API-Key` or `JWT.sub` for authenticated users; fallback to `X-Forwarded-For` (IP) for anonymous traffic.

- **Tiered Limiting:** Assign buckets based on customer tier ($C_{enterprise} > C_{free}$).

- **Global vs. Local:** Use local in-memory limiters for$P99$ latency, but synchronize to a central Redis store periodically to prevent distributed under-counting.