SiteError.comYour friendly guide to HTTP status codes
Status CodesBlog
  1. Home
  2. Blog
  3. Understanding 429 Too Many Requests: Rate Limiting Done Right

Understanding 429 Too Many Requests: Rate Limiting Done Right

April 15, 20268 min read
4xxClient Error

If you've ever woken up to a flood of 429 Too Many Requests errors in your logs, you know the sinking feeling. Whether you're the one sending too many requests or the one trying to protect your server, understanding 429 is essential in the API-driven world we live in. Let's break it down from both sides.

What Is a 429?

A 429 status code means the client has sent too many requests in a given time period. It's the server's way of saying "slow down" — a rate limit has been exceeded.

The 429 status code indicates that the user has sent too many requests in a given amount of time ("rate limiting"). — RFC 6585, Section 4

Unlike most 4xx errors, a 429 isn't about what you're requesting — it's about how often. The request itself might be perfectly valid; you're just making too many of them.

The response should include a Retry-After header telling the client when to try again:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
Content-Type: application/json

{"error": "Rate limit exceeded. Try again in 60 seconds."}

Why Rate Limiting Matters

Rate limiting isn't just about protecting servers from abuse. It serves multiple critical purposes:

  • Preventing abuse — Brute force login attempts, credential stuffing, and scraping
  • Fair resource sharing — Ensuring one heavy user doesn't degrade the experience for everyone else
  • Cost control — Especially for services that pay per-request to downstream APIs (databases, AI models, third-party services)
  • DDoS mitigation — A first line of defense against denial-of-service attacks
  • Business model enforcement — Free tier vs. paid tier API access with different rate limits

Common Rate Limiting Algorithms

There are several approaches to rate limiting, each with trade-offs:

AlgorithmHow it worksProsCons
Fixed WindowCount requests in fixed time windows (e.g., per minute)Simple to implementBurst at window boundaries
Sliding WindowRolling time window that moves with each requestSmoother than fixed windowSlightly more complex
Token BucketTokens added at a steady rate; each request costs a tokenAllows controlled burstsNeeds tunable parameters
Leaky BucketRequests processed at a fixed rate; excess queued or droppedPerfectly smooth outputNo burst tolerance

For most APIs, a sliding window or token bucket is the sweet spot — they prevent abuse while allowing reasonable bursts of activity.

Implementing Rate Limiting (Server Side)

Basic Express Middleware

Here's a simple in-memory rate limiter for a Node.js application:

import { type Request, type Response, type NextFunction } from "express";
 
const rateLimitMap = new Map<string, { count: number; resetTime: number }>();
 
function rateLimit(maxRequests: number, windowMs: number) {
  return (req: Request, res: Response, next: NextFunction) => {
    const key = req.ip ?? "unknown";
    const now = Date.now();
    const record = rateLimitMap.get(key);
 
    if (!record || now > record.resetTime) {
      rateLimitMap.set(key, { count: 1, resetTime: now + windowMs });
      setRateLimitHeaders(res, maxRequests, maxRequests - 1, now + windowMs);
      return next();
    }
 
    if (record.count >= maxRequests) {
      const retryAfter = Math.ceil((record.resetTime - now) / 1000);
      setRateLimitHeaders(res, maxRequests, 0, record.resetTime);
      res.set("Retry-After", String(retryAfter));
      return res.status(429).json({
        error: "Too many requests",
        retryAfter,
      });
    }
 
    record.count++;
    setRateLimitHeaders(
      res,
      maxRequests,
      maxRequests - record.count,
      record.resetTime,
    );
    next();
  };
}
 
function setRateLimitHeaders(
  res: Response,
  limit: number,
  remaining: number,
  reset: number,
) {
  res.set("X-RateLimit-Limit", String(limit));
  res.set("X-RateLimit-Remaining", String(remaining));
  res.set("X-RateLimit-Reset", String(Math.ceil(reset / 1000)));
}
 
// 100 requests per 15 minutes
app.use("/api/", rateLimit(100, 15 * 60 * 1000));

This works for a single server instance. For distributed systems, you'll need a shared store.

Redis-Backed Rate Limiter

For production workloads across multiple server instances, Redis is the standard:

import Redis from "ioredis";
 
const redis = new Redis(process.env.REDIS_URL);
 
async function slidingWindowRateLimit(
  key: string,
  maxRequests: number,
  windowMs: number,
): Promise<{ allowed: boolean; remaining: number; resetMs: number }> {
  const now = Date.now();
  const windowStart = now - windowMs;
 
  const pipeline = redis.pipeline();
  // Remove expired entries
  pipeline.zremrangebyscore(key, 0, windowStart);
  // Add current request
  pipeline.zadd(key, now, `${now}-${Math.random()}`);
  // Count requests in window
  pipeline.zcard(key);
  // Set key expiry
  pipeline.pexpire(key, windowMs);
 
  const results = await pipeline.exec();
  const requestCount = results?.[2]?.[1] as number;
 
  return {
    allowed: requestCount <= maxRequests,
    remaining: Math.max(0, maxRequests - requestCount),
    resetMs: windowMs,
  };
}

Edge / Middleware Rate Limiting

For even better protection, rate limit at the edge before requests reach your application. Platforms like Vercel and Cloudflare let you do this in middleware:

// Next.js middleware example
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
 
export function middleware(request: NextRequest) {
  const ip = request.headers.get("x-forwarded-for") ?? "unknown";
 
  // Check rate limit (simplified — use a real store in production)
  if (isRateLimited(ip)) {
    return NextResponse.json(
      { error: "Too many requests" },
      { status: 429, headers: { "Retry-After": "60" } },
    );
  }
 
  return NextResponse.next();
}

Handling 429s as a Client

When you're consuming an API and hit a 429, the worst thing you can do is immediately retry. That makes the problem worse — for you and for the server. Here's how to handle it properly.

Read the Retry-After Header

Always check for Retry-After first. It tells you exactly how long to wait:

async function fetchWithRetry(
  url: string,
  maxRetries = 3,
): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url);
 
    if (response.status !== 429) return response;
 
    const retryAfter = response.headers.get("Retry-After");
    const waitMs = retryAfter
      ? Number.parseInt(retryAfter, 10) * 1000
      : Math.min(1000 * Math.pow(2, attempt), 30000);
 
    console.warn(
      `Rate limited. Retrying in ${waitMs}ms` +
        ` (attempt ${attempt + 1}/${maxRetries})`,
    );
    await new Promise((resolve) => setTimeout(resolve, waitMs));
  }
 
  throw new Error(`Failed after ${maxRetries} retries due to rate limiting`);
}

Exponential Backoff with Jitter

If there's no Retry-After header, use exponential backoff with jitter to avoid the "thundering herd" problem — where many clients retry at exactly the same time:

function getBackoffMs(
  attempt: number,
  baseMs = 1000,
  maxMs = 30000,
): number {
  const exponentialDelay = baseMs * Math.pow(2, attempt);
  const jitter = Math.random() * exponentialDelay * 0.5;
  return Math.min(exponentialDelay + jitter, maxMs);
}
 
// Attempt 0: ~1000-1500ms
// Attempt 1: ~2000-3000ms
// Attempt 2: ~4000-6000ms
// Attempt 3: capped at 30000ms

The jitter is critical. Without it, if 100 clients all get rate-limited at the same time, they'll all retry at the same time — creating another spike.

Real-World Rate Limit Examples

Different APIs take very different approaches to rate limiting:

APIRate LimitWindowNotable Detail
GitHub5,000 requestsPer hourAuthenticated; 60/hr unauthenticated
Stripe100 requestsPer secondHigher limits for read-only endpoints
OpenAIVaries by modelPer minuteLimits on both requests AND tokens
Twitter/XVaries by tierPer 15 minutesFree tier is extremely restrictive
Shopify40 requestsPer secondUses a leaky bucket algorithm

Notice how some APIs limit by requests, some by tokens, and some by both. The OpenAI approach — limiting both requests-per-minute and tokens-per-minute — is becoming more common for AI APIs where a single request can vary enormously in cost.

The Right Response Headers

A well-implemented rate limiter communicates its state through standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1713225600
HeaderMeaning
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds (or date) to wait before retrying (on 429 responses)

Including these headers on every response — not just 429s — lets clients proactively throttle themselves before hitting the limit. This is good API citizenship.

429 vs. 503

These two codes can look similar from the client's perspective (the server is refusing requests), but they mean very different things:

StatusNameMeaningWho's affected?
429Too Many RequestsYou exceeded your rate limitJust the rate-limited client
503Service UnavailableThe server is overloaded or under maintenanceEveryone

The distinction matters for how you respond. A 429 means your client should back off. A 503 means the entire service is having problems — retrying with backoff is still appropriate, but the issue isn't specific to you.

Wrapping Up

Rate limiting is a fundamental part of building reliable APIs. On the server side, choose an algorithm that matches your traffic patterns, use Redis for distributed deployments, and always return informative headers so clients can self-regulate. On the client side, respect Retry-After, use exponential backoff with jitter, and never retry immediately.

The best rate limiting is invisible — clients stay within limits because the headers tell them exactly where they stand, and 429 responses are rare because the system communicates proactively.

For more details on related status codes, check out our pages on 429 Too Many Requests and 503 Service Unavailable. You might also find our posts on understanding 500 errors and understanding 502 errors helpful for the server-side error perspective.

Related Status Codes

🤨400Bad Request🔐401Unauthorized💳402Payment Required🚫403Forbidden🔍404Not Found🙅405Method Not Allowed🍽️406Not Acceptable🎫407Proxy Authentication Required⏰408Request Timeout⚔️409Conflict👻410Gone📏411Length Required❌412Precondition Failed📦413Payload Too Large📜414URI Too Long📼415Unsupported Media Type📖416Range Not Satisfiable😞417Expectation Failed🫖418I'm a Teapot🚪421Misdirected Request🤔422Unprocessable Entity🔒423Locked🎯424Failed Dependency⏰425Too Early⬆️426Upgrade Required🔑428Precondition Required🚦429Too Many Requests📋431Request Header Fields Too Large⚖️451Unavailable For Legal Reasons
Back to Blog

Popular Status Codes

  • 200 OK
  • 301 Moved Permanently
  • 302 Found
  • 400 Bad Request
  • 401 Unauthorized
  • 403 Forbidden
  • 404 Not Found
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable

Compare Codes

  • 401 vs 403
  • 301 vs 302
  • 404 vs 410
  • 500 vs 502
  • Compare any codes →

Categories

  • Informational
  • Success
  • Redirection
  • Client Error
  • Server Error
  • NGINX
  • Cloudflare
  • AWS ELB
  • Microsoft IIS

Tools

  • Cheat Sheet
  • Status Code Quiz
  • URL Checker
  • API Playground
  • Blog

© 2026 SiteError.com. All rights reserved.