Understanding 429 Too Many Requests: Rate Limiting Done Right

If you've ever woken up to a flood of 429 Too Many Requests errors in your logs, you know the sinking feeling. Whether you're the one sending too many requests or the one trying to protect your server, understanding 429 is essential in the API-driven world we live in. Let's break it down from both sides.

What Is a 429?

A 429 status code means the client has sent too many requests in a given time period. It's the server's way of saying "slow down" — a rate limit has been exceeded.

The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting"). — RFC 6585, Section 4

Unlike most 4xx errors, a 429 isn't about what you're requesting — it's about how often. The request itself might be perfectly valid; you're just making too many of them.

The response should include a Retry-After header telling the client when to try again:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json

{"error": "Rate limit exceeded. Try again in 60 seconds."}

Why Rate Limiting Matters

Rate limiting isn't just about protecting servers from abuse. It serves multiple critical purposes:

Preventing abuse — Brute force login attempts, credential stuffing, and scraping
Fair resource sharing — Ensuring one heavy user doesn't degrade the experience for everyone else
Cost control — Especially for services that pay per-request to downstream APIs (databases, AI models, third-party services)
DDoS mitigation — A first line of defense against denial-of-service attacks
Business model enforcement — Free tier vs. paid tier API access with different rate limits

Common Rate Limiting Algorithms

There are several approaches to rate limiting, each with trade-offs:

Algorithm	How it works	Pros	Cons
Fixed Window	Count requests in fixed time windows (e.g., per minute)	Simple to implement	Burst at window boundaries
Sliding Window	Rolling time window that moves with each request	Smoother than fixed window	Slightly more complex
Token Bucket	Tokens added at a steady rate; each request costs a token	Allows controlled bursts	Needs tunable parameters
Leaky Bucket	Requests processed at a fixed rate; excess queued or dropped	Perfectly smooth output	No burst tolerance

For most APIs, a sliding window or token bucket is the sweet spot — they prevent abuse while allowing reasonable bursts of activity.

Implementing Rate Limiting (Server Side)

Basic Express Middleware

Here's a simple in-memory rate limiter for a Node.js application:

import { type Request, type Response, type NextFunction } from "express";
 
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
 
function rateLimit(maxRequests: number, windowMs: number) {
  return (req: Request, res: Response, next: NextFunction) => {
    const key = req.ip ?? "unknown";
    const now = Date.now();
    const record = rateLimitMap.get(key);
 
    if (!record || now > record.resetTime) {
      rateLimitMap.set(key, { count: 1, resetTime: now + windowMs });
      setRateLimitHeaders(res, maxRequests, maxRequests - 1, now + windowMs);
      return next();
    }
 
    if (record.count >= maxRequests) {
      const retryAfter = Math.ceil((record.resetTime - now) / 1000);
      setRateLimitHeaders(res, maxRequests, 0, record.resetTime);
      res.set("Retry-After", String(retryAfter));
      return res.status(429).json({
        error: "Too many requests",
        retryAfter,
      });
    }
 
    record.count++;
    setRateLimitHeaders(
      res,
      maxRequests,
      maxRequests - record.count,
      record.resetTime,
    );
    next();
  };
}
 
function setRateLimitHeaders(
  res: Response,
  limit: number,
  remaining: number,
  reset: number,
) {
  res.set("X-RateLimit-Limit", String(limit));
  res.set("X-RateLimit-Remaining", String(remaining));
  res.set("X-RateLimit-Reset", String(Math.ceil(reset / 1000)));
}
 
// 100 requests per 15 minutes
app.use("/api/", rateLimit(100, 15 * 60 * 1000));

This works for a single server instance. For distributed systems, you'll need a shared store.

Redis-Backed Rate Limiter

For production workloads across multiple server instances, Redis is the standard:

import Redis from "ioredis";
 
const redis = new Redis(process.env.REDIS_URL);
 
async function slidingWindowRateLimit(
  key: string,
  maxRequests: number,
  windowMs: number,
): Promise<{ allowed: boolean; remaining: number; resetMs: number }> {
  const now = Date.now();
  const windowStart = now - windowMs;
 
  const pipeline = redis.pipeline();
  // Remove expired entries
  pipeline.zremrangebyscore(key, 0, windowStart);
  // Add current request
  pipeline.zadd(key, now, `${now}-${Math.random()}`);
  // Count requests in window
  pipeline.zcard(key);
  // Set key expiry
  pipeline.pexpire(key, windowMs);
 
  const results = await pipeline.exec();
  const requestCount = results?.[2]?.[1] as number;
 
  return {
    allowed: requestCount <= maxRequests,
    remaining: Math.max(0, maxRequests - requestCount),
    resetMs: windowMs,
  };
}

Edge / Middleware Rate Limiting

For even better protection, rate limit at the edge before requests reach your application. Platforms like Vercel and Cloudflare let you do this in middleware:

// Next.js middleware example
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
 
export function middleware(request: NextRequest) {
  const ip = request.headers.get("x-forwarded-for") ?? "unknown";
 
  // Check rate limit (simplified — use a real store in production)
  if (isRateLimited(ip)) {
    return NextResponse.json(
      { error: "Too many requests" },
      { status: 429, headers: { "Retry-After": "60" } },
    );
  }
 
  return NextResponse.next();
}

Handling 429s as a Client

When you're consuming an API and hit a 429, the worst thing you can do is immediately retry. That makes the problem worse — for you and for the server. Here's how to handle it properly.

Read the Retry-After Header

Always check for Retry-After first. It tells you exactly how long to wait:

async function fetchWithRetry(
  url: string,
  maxRetries = 3,
): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url);
 
    if (response.status !== 429) return response;
 
    const retryAfter = response.headers.get("Retry-After");
    const waitMs = retryAfter
      ? Number.parseInt(retryAfter, 10) * 1000
      : Math.min(1000 * Math.pow(2, attempt), 30000);
 
    console.warn(
      `Rate limited. Retrying in ${waitMs}ms` +
        ` (attempt ${attempt + 1}/${maxRetries})`,
    );
    await new Promise((resolve) => setTimeout(resolve, waitMs));
  }
 
  throw new Error(`Failed after ${maxRetries} retries due to rate limiting`);
}

Exponential Backoff with Jitter

If there's no Retry-After header, use exponential backoff with jitter to avoid the "thundering herd" problem — where many clients retry at exactly the same time:

function getBackoffMs(
  attempt: number,
  baseMs = 1000,
  maxMs = 30000,
): number {
  const exponentialDelay = baseMs * Math.pow(2, attempt);
  const jitter = Math.random() * exponentialDelay * 0.5;
  return Math.min(exponentialDelay + jitter, maxMs);
}
 
// Attempt 0: ~1000-1500ms
// Attempt 1: ~2000-3000ms
// Attempt 2: ~4000-6000ms
// Attempt 3: capped at 30000ms

The jitter is critical. Without it, if 100 clients all get rate-limited at the same time, they'll all retry at the same time — creating another spike.

Real-World Rate Limit Examples

Different APIs take very different approaches to rate limiting:

API	Rate Limit	Window	Notable Detail
GitHub	5,000 requests	Per hour	Authenticated; 60/hr unauthenticated
Stripe	100 requests	Per second	Higher limits for read-only endpoints
OpenAI	Varies by model	Per minute	Limits on both requests AND tokens
Twitter/X	Varies by tier	Per 15 minutes	Free tier is extremely restrictive
Shopify	40 requests	Per second	Uses a leaky bucket algorithm

Notice how some APIs limit by requests, some by tokens, and some by both. The OpenAI approach — limiting both requests-per-minute and tokens-per-minute — is becoming more common for AI APIs where a single request can vary enormously in cost.

The Right Response Headers

A well-implemented rate limiter communicates its state through standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1713225600

Header	Meaning
`X-RateLimit-Limit`	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds (or date) to wait before retrying (on 429 responses)

Including these headers on every response — not just 429s — lets clients proactively throttle themselves before hitting the limit. This is good API citizenship.

429 vs. 503

These two codes can look similar from the client's perspective (the server is refusing requests), but they mean very different things:

Status	Name	Meaning	Who's affected?
429	Too Many Requests	You exceeded your rate limit	Just the rate-limited client
503	Service Unavailable	The server is overloaded or under maintenance	Everyone

The distinction matters for how you respond. A 429 means your client should back off. A 503 means the entire service is having problems — retrying with backoff is still appropriate, but the issue isn't specific to you.

Wrapping Up

Rate limiting is a fundamental part of building reliable APIs. On the server side, choose an algorithm that matches your traffic patterns, use Redis for distributed deployments, and always return informative headers so clients can self-regulate. On the client side, respect Retry-After, use exponential backoff with jitter, and never retry immediately.

The best rate limiting is invisible — clients stay within limits because the headers tell them exactly where they stand, and 429 responses are rare because the system communicates proactively.

For more details on related status codes, check out our pages on 429 Too Many Requests and 503 Service Unavailable. You might also find our posts on understanding 500 errors and understanding 502 errors helpful for the server-side error perspective.

What Is a 429?

A 429 status code means the client has sent too many requests in a given time period. It's the server's way of saying "slow down" — a rate limit has been exceeded.

The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting"). — RFC 6585, Section 4

Unlike most 4xx errors, a 429 isn't about what you're requesting — it's about how often. The request itself might be perfectly valid; you're just making too many of them.

The response should include a Retry-After header telling the client when to try again:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json

{"error": "Rate limit exceeded. Try again in 60 seconds."}

Why Rate Limiting Matters

Rate limiting isn't just about protecting servers from abuse. It serves multiple critical purposes:

Preventing abuse — Brute force login attempts, credential stuffing, and scraping
Fair resource sharing — Ensuring one heavy user doesn't degrade the experience for everyone else
Cost control — Especially for services that pay per-request to downstream APIs (databases, AI models, third-party services)
DDoS mitigation — A first line of defense against denial-of-service attacks
Business model enforcement — Free tier vs. paid tier API access with different rate limits

Common Rate Limiting Algorithms

There are several approaches to rate limiting, each with trade-offs:

Algorithm	How it works	Pros	Cons
Fixed Window	Count requests in fixed time windows (e.g., per minute)	Simple to implement	Burst at window boundaries
Sliding Window	Rolling time window that moves with each request	Smoother than fixed window	Slightly more complex
Token Bucket	Tokens added at a steady rate; each request costs a token	Allows controlled bursts	Needs tunable parameters
Leaky Bucket	Requests processed at a fixed rate; excess queued or dropped	Perfectly smooth output	No burst tolerance

For most APIs, a sliding window or token bucket is the sweet spot — they prevent abuse while allowing reasonable bursts of activity.

Implementing Rate Limiting (Server Side)

Basic Express Middleware

Here's a simple in-memory rate limiter for a Node.js application:

import { type Request, type Response, type NextFunction } from "express";
 
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
 
function rateLimit(maxRequests: number, windowMs: number) {
  return (req: Request, res: Response, next: NextFunction) => {
    const key = req.ip ?? "unknown";
    const now = Date.now();
    const record = rateLimitMap.get(key);
 
    if (!record || now > record.resetTime) {
      rateLimitMap.set(key, { count: 1, resetTime: now + windowMs });
      setRateLimitHeaders(res, maxRequests, maxRequests - 1, now + windowMs);
      return next();
    }
 
    if (record.count >= maxRequests) {
      const retryAfter = Math.ceil((record.resetTime - now) / 1000);
      setRateLimitHeaders(res, maxRequests, 0, record.resetTime);
      res.set("Retry-After", String(retryAfter));
      return res.status(429).json({
        error: "Too many requests",
        retryAfter,
      });
    }
 
    record.count++;
    setRateLimitHeaders(
      res,
      maxRequests,
      maxRequests - record.count,
      record.resetTime,
    );
    next();
  };
}
 
function setRateLimitHeaders(
  res: Response,
  limit: number,
  remaining: number,
  reset: number,
) {
  res.set("X-RateLimit-Limit", String(limit));
  res.set("X-RateLimit-Remaining", String(remaining));
  res.set("X-RateLimit-Reset", String(Math.ceil(reset / 1000)));
}
 
// 100 requests per 15 minutes
app.use("/api/", rateLimit(100, 15 * 60 * 1000));

This works for a single server instance. For distributed systems, you'll need a shared store.

Redis-Backed Rate Limiter

For production workloads across multiple server instances, Redis is the standard:

import Redis from "ioredis";
 
const redis = new Redis(process.env.REDIS_URL);
 
async function slidingWindowRateLimit(
  key: string,
  maxRequests: number,
  windowMs: number,
): Promise<{ allowed: boolean; remaining: number; resetMs: number }> {
  const now = Date.now();
  const windowStart = now - windowMs;
 
  const pipeline = redis.pipeline();
  // Remove expired entries
  pipeline.zremrangebyscore(key, 0, windowStart);
  // Add current request
  pipeline.zadd(key, now, `${now}-${Math.random()}`);
  // Count requests in window
  pipeline.zcard(key);
  // Set key expiry
  pipeline.pexpire(key, windowMs);
 
  const results = await pipeline.exec();
  const requestCount = results?.[2]?.[1] as number;
 
  return {
    allowed: requestCount <= maxRequests,
    remaining: Math.max(0, maxRequests - requestCount),
    resetMs: windowMs,
  };
}

Edge / Middleware Rate Limiting

For even better protection, rate limit at the edge before requests reach your application. Platforms like Vercel and Cloudflare let you do this in middleware:

// Next.js middleware example
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
 
export function middleware(request: NextRequest) {
  const ip = request.headers.get("x-forwarded-for") ?? "unknown";
 
  // Check rate limit (simplified — use a real store in production)
  if (isRateLimited(ip)) {
    return NextResponse.json(
      { error: "Too many requests" },
      { status: 429, headers: { "Retry-After": "60" } },
    );
  }
 
  return NextResponse.next();
}

Handling 429s as a Client

When you're consuming an API and hit a 429, the worst thing you can do is immediately retry. That makes the problem worse — for you and for the server. Here's how to handle it properly.

Read the Retry-After Header

Always check for Retry-After first. It tells you exactly how long to wait:

async function fetchWithRetry(
  url: string,
  maxRetries = 3,
): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url);
 
    if (response.status !== 429) return response;
 
    const retryAfter = response.headers.get("Retry-After");
    const waitMs = retryAfter
      ? Number.parseInt(retryAfter, 10) * 1000
      : Math.min(1000 * Math.pow(2, attempt), 30000);
 
    console.warn(
      `Rate limited. Retrying in ${waitMs}ms` +
        ` (attempt ${attempt + 1}/${maxRetries})`,
    );
    await new Promise((resolve) => setTimeout(resolve, waitMs));
  }
 
  throw new Error(`Failed after ${maxRetries} retries due to rate limiting`);
}

Exponential Backoff with Jitter

If there's no Retry-After header, use exponential backoff with jitter to avoid the "thundering herd" problem — where many clients retry at exactly the same time:

function getBackoffMs(
  attempt: number,
  baseMs = 1000,
  maxMs = 30000,
): number {
  const exponentialDelay = baseMs * Math.pow(2, attempt);
  const jitter = Math.random() * exponentialDelay * 0.5;
  return Math.min(exponentialDelay + jitter, maxMs);
}
 
// Attempt 0: ~1000-1500ms
// Attempt 1: ~2000-3000ms
// Attempt 2: ~4000-6000ms
// Attempt 3: capped at 30000ms

The jitter is critical. Without it, if 100 clients all get rate-limited at the same time, they'll all retry at the same time — creating another spike.

Real-World Rate Limit Examples

Different APIs take very different approaches to rate limiting:

API	Rate Limit	Window	Notable Detail
GitHub	5,000 requests	Per hour	Authenticated; 60/hr unauthenticated
Stripe	100 requests	Per second	Higher limits for read-only endpoints
OpenAI	Varies by model	Per minute	Limits on both requests AND tokens
Twitter/X	Varies by tier	Per 15 minutes	Free tier is extremely restrictive
Shopify	40 requests	Per second	Uses a leaky bucket algorithm

The Right Response Headers

A well-implemented rate limiter communicates its state through standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1713225600

Header	Meaning
`X-RateLimit-Limit`	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds (or date) to wait before retrying (on 429 responses)

Including these headers on every response — not just 429s — lets clients proactively throttle themselves before hitting the limit. This is good API citizenship.

429 vs. 503

These two codes can look similar from the client's perspective (the server is refusing requests), but they mean very different things:

Status	Name	Meaning	Who's affected?
429	Too Many Requests	You exceeded your rate limit	Just the rate-limited client
503	Service Unavailable	The server is overloaded or under maintenance	Everyone

Wrapping Up

The best rate limiting is invisible — clients stay within limits because the headers tell them exactly where they stand, and 429 responses are rare because the system communicates proactively.

What Is a 429?

Why Rate Limiting Matters

Common Rate Limiting Algorithms

Implementing Rate Limiting (Server Side)

Basic Express Middleware

Redis-Backed Rate Limiter

Edge / Middleware Rate Limiting

Handling 429s as a Client

Read the Retry-After Header

Exponential Backoff with Jitter

Real-World Rate Limit Examples

The Right Response Headers

429 vs. 503

Wrapping Up

Related Status Codes

Understanding 429 Too Many Requests: Rate Limiting Done Right

What Is a 429?

Why Rate Limiting Matters

Common Rate Limiting Algorithms

Implementing Rate Limiting (Server Side)

Basic Express Middleware

Redis-Backed Rate Limiter

Edge / Middleware Rate Limiting

Handling 429s as a Client

Read the Retry-After Header

Exponential Backoff with Jitter

Real-World Rate Limit Examples

The Right Response Headers

429 vs. 503

Wrapping Up

Related Status Codes