Understanding 429 Too Many Requests: Rate Limiting Done Right
If you've ever woken up to a flood of 429 Too Many Requests errors in your logs, you know the sinking feeling. Whether you're the one sending too many requests or the one trying to protect your server, understanding 429 is essential in the API-driven world we live in. Let's break it down from both sides.
What Is a 429?
A 429 status code means the client has sent too many requests in a given time period. It's the server's way of saying "slow down" — a rate limit has been exceeded.
The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting"). — RFC 6585, Section 4
Unlike most 4xx errors, a 429 isn't about what you're requesting — it's about how often. The request itself might be perfectly valid; you're just making too many of them.
The response should include a Retry-After header telling the client when to try again:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json
{"error": "Rate limit exceeded. Try again in 60 seconds."}
Why Rate Limiting Matters
Rate limiting isn't just about protecting servers from abuse. It serves multiple critical purposes:
- Preventing abuse — Brute force login attempts, credential stuffing, and scraping
- Fair resource sharing — Ensuring one heavy user doesn't degrade the experience for everyone else
- Cost control — Especially for services that pay per-request to downstream APIs (databases, AI models, third-party services)
- DDoS mitigation — A first line of defense against denial-of-service attacks
- Business model enforcement — Free tier vs. paid tier API access with different rate limits
Common Rate Limiting Algorithms
There are several approaches to rate limiting, each with trade-offs:
| Algorithm | How it works | Pros | Cons |
|---|---|---|---|
| Fixed Window | Count requests in fixed time windows (e.g., per minute) | Simple to implement | Burst at window boundaries |
| Sliding Window | Rolling time window that moves with each request | Smoother than fixed window | Slightly more complex |
| Token Bucket | Tokens added at a steady rate; each request costs a token | Allows controlled bursts | Needs tunable parameters |
| Leaky Bucket | Requests processed at a fixed rate; excess queued or dropped | Perfectly smooth output | No burst tolerance |
For most APIs, a sliding window or token bucket is the sweet spot — they prevent abuse while allowing reasonable bursts of activity.
Implementing Rate Limiting (Server Side)
Basic Express Middleware
Here's a simple in-memory rate limiter for a Node.js application:
import { type Request, type Response, type NextFunction } from "express";
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
function rateLimit(maxRequests: number, windowMs: number) {
return (req: Request, res: Response, next: NextFunction) => {
const key = req.ip ?? "unknown";
const now = Date.now();
const record = rateLimitMap.get(key);
if (!record || now > record.resetTime) {
rateLimitMap.set(key, { count: 1, resetTime: now + windowMs });
setRateLimitHeaders(res, maxRequests, maxRequests - 1, now + windowMs);
return next();
}
if (record.count >= maxRequests) {
const retryAfter = Math.ceil((record.resetTime - now) / 1000);
setRateLimitHeaders(res, maxRequests, 0, record.resetTime);
res.set("Retry-After", String(retryAfter));
return res.status(429).json({
error: "Too many requests",
retryAfter,
});
}
record.count++;
setRateLimitHeaders(
res,
maxRequests,
maxRequests - record.count,
record.resetTime,
);
next();
};
}
function setRateLimitHeaders(
res: Response,
limit: number,
remaining: number,
reset: number,
) {
res.set("X-RateLimit-Limit", String(limit));
res.set("X-RateLimit-Remaining", String(remaining));
res.set("X-RateLimit-Reset", String(Math.ceil(reset / 1000)));
}
// 100 requests per 15 minutes
app.use("/api/", rateLimit(100, 15 * 60 * 1000));This works for a single server instance. For distributed systems, you'll need a shared store.
Redis-Backed Rate Limiter
For production workloads across multiple server instances, Redis is the standard:
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
async function slidingWindowRateLimit(
key: string,
maxRequests: number,
windowMs: number,
): Promise<{ allowed: boolean; remaining: number; resetMs: number }> {
const now = Date.now();
const windowStart = now - windowMs;
const pipeline = redis.pipeline();
// Remove expired entries
pipeline.zremrangebyscore(key, 0, windowStart);
// Add current request
pipeline.zadd(key, now, `${now}-${Math.random()}`);
// Count requests in window
pipeline.zcard(key);
// Set key expiry
pipeline.pexpire(key, windowMs);
const results = await pipeline.exec();
const requestCount = results?.[2]?.[1] as number;
return {
allowed: requestCount <= maxRequests,
remaining: Math.max(0, maxRequests - requestCount),
resetMs: windowMs,
};
}Edge / Middleware Rate Limiting
For even better protection, rate limit at the edge before requests reach your application. Platforms like Vercel and Cloudflare let you do this in middleware:
// Next.js middleware example
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
export function middleware(request: NextRequest) {
const ip = request.headers.get("x-forwarded-for") ?? "unknown";
// Check rate limit (simplified — use a real store in production)
if (isRateLimited(ip)) {
return NextResponse.json(
{ error: "Too many requests" },
{ status: 429, headers: { "Retry-After": "60" } },
);
}
return NextResponse.next();
}Handling 429s as a Client
When you're consuming an API and hit a 429, the worst thing you can do is immediately retry. That makes the problem worse — for you and for the server. Here's how to handle it properly.
Read the Retry-After Header
Always check for Retry-After first. It tells you exactly how long to wait:
async function fetchWithRetry(
url: string,
maxRetries = 3,
): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url);
if (response.status !== 429) return response;
const retryAfter = response.headers.get("Retry-After");
const waitMs = retryAfter
? Number.parseInt(retryAfter, 10) * 1000
: Math.min(1000 * Math.pow(2, attempt), 30000);
console.warn(
`Rate limited. Retrying in ${waitMs}ms` +
` (attempt ${attempt + 1}/${maxRetries})`,
);
await new Promise((resolve) => setTimeout(resolve, waitMs));
}
throw new Error(`Failed after ${maxRetries} retries due to rate limiting`);
}Exponential Backoff with Jitter
If there's no Retry-After header, use exponential backoff with jitter to avoid the "thundering herd" problem — where many clients retry at exactly the same time:
function getBackoffMs(
attempt: number,
baseMs = 1000,
maxMs = 30000,
): number {
const exponentialDelay = baseMs * Math.pow(2, attempt);
const jitter = Math.random() * exponentialDelay * 0.5;
return Math.min(exponentialDelay + jitter, maxMs);
}
// Attempt 0: ~1000-1500ms
// Attempt 1: ~2000-3000ms
// Attempt 2: ~4000-6000ms
// Attempt 3: capped at 30000msThe jitter is critical. Without it, if 100 clients all get rate-limited at the same time, they'll all retry at the same time — creating another spike.
Real-World Rate Limit Examples
Different APIs take very different approaches to rate limiting:
| API | Rate Limit | Window | Notable Detail |
|---|---|---|---|
| GitHub | 5,000 requests | Per hour | Authenticated; 60/hr unauthenticated |
| Stripe | 100 requests | Per second | Higher limits for read-only endpoints |
| OpenAI | Varies by model | Per minute | Limits on both requests AND tokens |
| Twitter/X | Varies by tier | Per 15 minutes | Free tier is extremely restrictive |
| Shopify | 40 requests | Per second | Uses a leaky bucket algorithm |
Notice how some APIs limit by requests, some by tokens, and some by both. The OpenAI approach — limiting both requests-per-minute and tokens-per-minute — is becoming more common for AI APIs where a single request can vary enormously in cost.
The Right Response Headers
A well-implemented rate limiter communicates its state through standard headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1713225600
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds (or date) to wait before retrying (on 429 responses) |
Including these headers on every response — not just 429s — lets clients proactively throttle themselves before hitting the limit. This is good API citizenship.
429 vs. 503
These two codes can look similar from the client's perspective (the server is refusing requests), but they mean very different things:
| Status | Name | Meaning | Who's affected? |
|---|---|---|---|
| 429 | Too Many Requests | You exceeded your rate limit | Just the rate-limited client |
| 503 | Service Unavailable | The server is overloaded or under maintenance | Everyone |
The distinction matters for how you respond. A 429 means your client should back off. A 503 means the entire service is having problems — retrying with backoff is still appropriate, but the issue isn't specific to you.
Wrapping Up
Rate limiting is a fundamental part of building reliable APIs. On the server side, choose an algorithm that matches your traffic patterns, use Redis for distributed deployments, and always return informative headers so clients can self-regulate. On the client side, respect Retry-After, use exponential backoff with jitter, and never retry immediately.
The best rate limiting is invisible — clients stay within limits because the headers tell them exactly where they stand, and 429 responses are rare because the system communicates proactively.
For more details on related status codes, check out our pages on 429 Too Many Requests and 503 Service Unavailable. You might also find our posts on understanding 500 errors and understanding 502 errors helpful for the server-side error perspective.