Rate Limiting & Backpressure Handling Permalink to this section

Q: Should I drop events or queue them when the rate limit is exceeded?

Depends on event semantics. For real-time telemetry where freshness matters more than completeness, use a small bounded queue with head-drop. For transactional events that must not be lost, queue them in a persistent store keyed by Last-Event-ID so the client can replay after reconnecting.

Q: How do I detect that a slow consumer is causing backpressure rather than a network glitch?

Sustained backpressure from a slow consumer shows as repeated drain events on the same connection (Node.js), Fprintf consistently blocking until the write deadline (Go), or growing queue depth for a single connection while others drain normally. Network glitches produce a single error followed by a client reconnect.

Q: What retry interval should I set when the server applies heavy rate limiting?

Set retry: 3000 (3 seconds) as a baseline. Under sustained rate limiting, increase it dynamically by emitting a new retry: directive before closing the connection. Combine with exponential backoff on the client side via a custom EventSource wrapper.

Part of Backend Stream Generation & Connection Management.

In long-lived SSE connections, two distinct failure modes lurk on opposite ends of the pipe. On the producer side, unbounded event generation floods the TCP send buffer and eventually exhausts server memory when thousands of connections accumulate unconsumed data. On the consumer side, slow clients that cannot process events as fast as the server emits them create slow-consumer pressure that ripples back through the network stack. This guide covers both: rate limiting (capping how fast the server produces events per connection) and backpressure (detecting when a consumer is falling behind and responding without crashing the server). The practical outcomes are predictable P99 latency, bounded memory usage, and graceful degradation instead of silent data loss.

Token-bucket rate limiter sits between the event source and the write buffer; backpressure from a slow TCP consumer feeds back to pause the producer.

How Rate Limiting and Backpressure Work Permalink to this section

The two-layer model Permalink to this section

Rate limiting and backpressure operate at different layers and serve different goals:

Layer	Mechanism	Goal	Where it lives
Application rate limit	Token bucket / sliding window	Cap events emitted per connection per second	Middleware before `res.write()`
TCP write backpressure	`EAGAIN` / `drain` event	Stall producer when OS send buffer is full	OS socket layer
Queue backpressure	Bounded in-memory queue	Absorb bursts without dropping	Between dispatcher and writer
Drop policy	Tail drop / head drop / random	Shed load when all buffers saturate	Last resort before OOM

The SSE wire format specifies text/event-stream with \n\n-terminated fields. There is no built-in flow-control field in the protocol — the server must implement all throttling outside the event format itself. The retry: directive controls the client reconnection interval, not server emission rate.

Token bucket algorithm Permalink to this section

A token bucket maintains a counter of available tokens. Each event costs one token. Tokens refill at a fixed rate up to a capacity cap. When the bucket is empty, the event is either queued or dropped.

capacity = 100 tokens
refill   = 50 tokens/second
cost     = 1 token per event

On emit attempt:
  if tokens >= cost:
    tokens -= cost
    emit event
  else:
    drop or enqueue event

The burst capacity (= capacity) lets a connection absorb momentary traffic spikes without shedding events, while the refill rate enforces a long-run throughput ceiling. See Applying Token-Bucket Rate Limiting to Event Streams for per-language implementations with accurate refill timing.

TCP backpressure signal Permalink to this section

res.write() in Node.js (and equivalent calls in Go/Python) returns false when the underlying TCP send buffer is full. This is the OS signalling that the consumer is not draining data fast enough. The correct response is to pause the upstream producer immediately and resume only when the drain event fires.

Ignoring the return value of write() is the most common cause of unbounded memory growth in SSE servers — the application keeps enqueueing chunks that the OS cannot transmit, and heap memory climbs until the process OOMs or the connection resets.

Server-Side Implementation: Node.js Permalink to this section

Basic token bucket + drain pattern Permalink to this section

// sse-rate-limiter.js — Node.js HTTP SSE with token bucket + backpressure
import http from 'node:http';

class TokenBucket {
  constructor({ capacity = 100, refillPerSecond = 50 } = {}) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillPerSecond = refillPerSecond;
    this.lastRefill = Date.now();
  }

  consume(cost = 1) {
    this._refill();
    if (this.tokens < cost) return false; // bucket empty — deny
    this.tokens -= cost;
    return true;
  }

  _refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000; // seconds
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillPerSecond);
    this.lastRefill = now;
  }
}

const server = http.createServer((req, res) => {
  if (req.url !== '/events') { res.end(); return; }

  res.writeHead(200, {
    'Content-Type':  'text/event-stream',
    'Cache-Control': 'no-cache',
    'Connection':    'keep-alive',
    'X-Accel-Buffering': 'no', // disable nginx proxy buffering
  });

  // Tell the client to wait 3 s before reconnecting if the stream closes
  res.write('retry: 3000\n\n');

  const bucket = new TokenBucket({ capacity: 100, refillPerSecond: 50 });
  let paused = false;
  let dropsThisConnection = 0;

  const emit = (eventData) => {
    if (paused) {
      // Producer is paused waiting for TCP drain — drop this event
      dropsThisConnection++;
      console.warn({ event: 'drop', reason: 'tcp_backpressure', drops: dropsThisConnection });
      return;
    }

    if (!bucket.consume()) {
      // Token bucket empty — rate limit exceeded
      dropsThisConnection++;
      console.warn({ event: 'drop', reason: 'rate_limit', drops: dropsThisConnection });
      return;
    }

    const chunk = `data: ${JSON.stringify(eventData)}\n\n`;
    const ok = res.write(chunk);
    if (!ok) {
      // TCP send buffer full — pause upstream until drain
      paused = true;
      res.once('drain', () => { paused = false; });
    }
  };

  // Simulated producer — replace with your actual event source
  const interval = setInterval(() => emit({ ts: Date.now(), value: Math.random() }), 10);

  req.on('close', () => {
    clearInterval(interval);
    console.info({ event: 'client_disconnected', drops: dropsThisConnection });
  });
});

server.listen(3000);

Key points:

X-Accel-Buffering: no tells nginx to pass chunks directly to the client instead of buffering them at the proxy layer.
res.write() returns false when the kernel send buffer is full; drain fires when it empties.
Both the token bucket and backpressure signal log structured drop events — these are the primary observability hook.

For a complete disconnect-handling pattern that cleans up intervals and removes the connection from a global registry, see Handling Client Disconnects in Node.js SSE.

Server-Side Implementation: Go Permalink to this section

Go’s http.Flusher interface and channels give you explicit backpressure without event-loop callbacks.

// sse_ratelimit.go — Go SSE with token bucket and channel-based backpressure
package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log/slog"
	"math/rand"
	"net/http"
	"time"
	"golang.org/x/time/rate" // go get golang.org/x/time/rate
)

type Event struct {
	TS    int64   `json:"ts"`
	Value float64 `json:"value"`
}

func sseHandler(w http.ResponseWriter, r *http.Request) {
	flusher, ok := w.(http.Flusher)
	if !ok {
		http.Error(w, "streaming unsupported", http.StatusInternalServerError)
		return
	}

	w.Header().Set("Content-Type", "text/event-stream")
	w.Header().Set("Cache-Control", "no-cache")
	w.Header().Set("Connection", "keep-alive")
	w.Header().Set("X-Accel-Buffering", "no")

	fmt.Fprintf(w, "retry: 3000\n\n")
	flusher.Flush()

	// 50 events/s sustained, burst of 100
	limiter := rate.NewLimiter(rate.Limit(50), 100)

	ctx := r.Context()
	drops := 0

	for {
		select {
		case <-ctx.Done():
			slog.Info("client disconnected", "drops", drops)
			return
		default:
		}

		// Wait for a token (blocks until one is available or context cancels)
		if err := limiter.Wait(ctx); err != nil {
			slog.Info("limiter wait cancelled", "err", err)
			return
		}

		ev := Event{TS: time.Now().UnixMilli(), Value: rand.Float64()}
		payload, _ := json.Marshal(ev)

		// Write is synchronous in Go's net/http — if the client TCP buffer is full,
		// Write blocks. Set a write deadline to detect slow consumers.
		if deadline, ok := ctx.Deadline(); ok {
			w.(http.ResponseWriter).(interface{ SetWriteDeadline(time.Time) error }).
				SetWriteDeadline(deadline)
		}
		// Use a per-write deadline to detect stalled consumers
		rc := http.NewResponseController(w)
		rc.SetWriteDeadline(time.Now().Add(5 * time.Second))

		if _, err := fmt.Fprintf(w, "data: %s\n\n", payload); err != nil {
			slog.Warn("write error — slow consumer or disconnect", "err", err, "drops", drops)
			return
		}
		flusher.Flush()
	}
}

func main() {
	http.HandleFunc("/events", sseHandler)
	slog.Info("listening on :3000")
	http.ListenAndServe(":3000", nil)
}

http.NewResponseController (Go 1.20+) lets you call SetWriteDeadline on any ResponseWriter without a type assertion. If Fprintf blocks beyond the deadline, it returns an error and you close the connection — this is the idiomatic Go backpressure response. See Go Streaming Patterns for SSE for the full channel fan-out pattern using http.Flusher.

Per-connection queue with drop policy (Go) Permalink to this section

When you want to absorb short bursts rather than stalling the rate limiter, use a bounded channel as a per-connection queue:

// Bounded event queue — drop newest events when full (tail-drop policy)
queue := make(chan []byte, 256) // 256-event buffer per connection

// Producer goroutine
go func() {
	for ev := range eventSource {
		payload, _ := json.Marshal(ev)
		select {
		case queue <- payload:
			// accepted
		default:
			// queue full — tail drop
			slog.Warn("event dropped", "policy", "tail_drop")
		}
	}
}()

// Writer goroutine
for {
	select {
	case <-ctx.Done():
		return
	case payload := <-queue:
		rc.SetWriteDeadline(time.Now().Add(5 * time.Second))
		fmt.Fprintf(w, "data: %s\n\n", payload)
		flusher.Flush()
	}
}

A channel of size 256 limits worst-case memory per connection to roughly 256 × avgEventSize. At 1 KB/event that is 256 KB — multiply by connection count to budget total memory.

Drop Policies Permalink to this section

When both the rate limiter and the queue are saturated, you must decide which events to discard:

Policy	Mechanism	Best for	Drawback
Tail drop	Discard the newest event when queue full	Simplest; preserves sequence order	Newest data lost during saturation
Head drop	Dequeue oldest, enqueue newest	Real-time telemetry where freshness > completeness	Out-of-order gaps; stale events silently lost
Random drop	Drop random entry	Fairer under sustained load	Unpredictable client-side gaps
Selective drop	Drop events with lower `priority` field	Business-logic aware throttling	Requires event schema knowledge

For SSE systems using Idempotent Event ID Generation, tail-drop is safest — the client’s Last-Event-ID reconnection will request the last ID it saw, and the server can replay from a persistent store. Head-drop breaks this guarantee because the oldest dropped event may be exactly what the client needs on reconnect.

Python FastAPI Implementation Permalink to this section

# sse_ratelimit.py — FastAPI SSE with token bucket backpressure
import asyncio
import json
import time
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse

app = FastAPI()

class AsyncTokenBucket:
    def __init__(self, capacity: int = 100, refill_per_second: float = 50.0):
        self.capacity = capacity
        self.tokens = float(capacity)
        self.refill_per_second = refill_per_second
        self._last_refill = time.monotonic()
        self._lock = asyncio.Lock()

    async def consume(self, cost: float = 1.0) -> bool:
        async with self._lock:
            now = time.monotonic()
            elapsed = now - self._last_refill
            self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_per_second)
            self._last_refill = now
            if self.tokens >= cost:
                self.tokens -= cost
                return True
            return False

async def event_generator(request: Request):
    bucket = AsyncTokenBucket(capacity=100, refill_per_second=50)
    drops = 0
    # Send retry directive as first frame
    yield "retry: 3000\n\n"

    while True:
        if await request.is_disconnected():
            print(f"client disconnected, drops={drops}")
            break

        allowed = await bucket.consume()
        if not allowed:
            drops += 1
            # Back off briefly — do not spin the event loop
            await asyncio.sleep(0.02)
            continue

        payload = json.dumps({"ts": int(time.time() * 1000), "v": __import__('random').random()})
        yield f"data: {payload}\n\n"
        # Yield control so other connections are not starved
        await asyncio.sleep(0)

@app.get("/events")
async def sse_endpoint(request: Request):
    return StreamingResponse(
        event_generator(request),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",
        },
    )

request.is_disconnected() is an async check that polls the ASGI connection state — it returns True when the client has closed the TCP connection. This prevents the generator from running indefinitely for ghost connections. For full sse-starlette integration, see Python FastAPI SSE Implementation Guide.

Edge Cases and Network Interference Permalink to this section

Proxy buffering Permalink to this section

The most dangerous invisible backpressure source is the reverse proxy buffer. nginx, Apache, and AWS ALB all buffer responses by default. Events pile up in the proxy without ever reaching the client, making the server believe the consumer is alive and draining.

Proxy	Disable buffering	Notes
nginx	`proxy_buffering off;` + `X-Accel-Buffering: no` header	Header overrides per-location config
Apache mod_proxy	`SetEnv proxy-sendchunked 1`	Also needs `ProxyBadHeader Ignore`
AWS ALB	Enable `chunked_encoding` attribute	Cannot disable buffering entirely — keep events < 1 MB
Cloudflare	Set `Transfer-Encoding: chunked`; Polish/Rocket Loader off	Paid plans can use streaming responses
Varnish	`beresp.do_stream = true;` in VCL	Also set `beresp.uncacheable = true;`

Always send X-Accel-Buffering: no from the application layer — it propagates through most proxy chains and is the single highest-impact change for reducing proxy-induced latency.

OS socket buffer exhaustion Permalink to this section

The kernel TCP send buffer defaults to 128–256 KB depending on the OS. Under Buffer Management & Chunked Transfer Encoding misconfigurations, a slow client will fill this buffer and cause EAGAIN errors (non-blocking socket) or stall write() calls (blocking socket). Node.js surfaces this as res.write() returning false; Go surfaces it as Fprintf blocking until the write deadline expires.

Tune kernel parameters if you run many concurrent SSE connections:

# /etc/sysctl.d/99-sse.conf
net.core.wmem_max = 4194304        # 4 MB max send buffer
net.core.wmem_default = 262144     # 256 KB default send buffer
net.ipv4.tcp_wmem = 4096 65536 4194304

Firewall and idle-connection timeouts Permalink to this section

Stateful firewalls and NAT gateways terminate TCP connections they consider idle — typically after 30–300 s with no data in flight. An SSE connection that rate-limits down to 0 events/s appears idle to the firewall.

Mitigate with a heartbeat comment:

// Send a no-op SSE comment every 20 s to keep the connection alive
const heartbeat = setInterval(() => {
  const ok = res.write(': heartbeat\n\n');
  if (!ok) {
    // Heartbeat itself encountered backpressure — connection is dead
    clearInterval(heartbeat);
    res.end();
  }
}, 20_000);

Comments (lines starting with :) are legal SSE frames and are silently ignored by the EventSource API on the client.

Connection pool starvation Permalink to this section

Under aggressive rate limiting, connections can stay open for long periods with low event throughput. This is healthy for individual clients but dangerous for the server’s Connection Pooling for SSE Servers strategy — file descriptors and memory are consumed by idle-but-alive connections. Set a maximum connection age and close the stream gracefully:

// Force reconnect after 10 minutes — client will auto-reconnect via EventSource
const maxAge = setTimeout(() => {
  res.write('event: reconnect\ndata: {}\n\n');
  res.end();
}, 10 * 60 * 1000);
req.on('close', () => clearTimeout(maxAge));

Performance and Scale Considerations Permalink to this section

Memory budget per connection Permalink to this section

Each SSE connection holds at minimum:

A per-connection token bucket state: ~40 bytes
A bounded event queue (if used): queueDepth × avgEventBytes
The OS TCP send buffer: 128–256 KB kernel memory (not heap)
Node.js socket wrapping overhead: ~4–8 KB heap

At 10,000 concurrent connections with a 256-item × 512-byte queue:

10,000 × (256 × 512) = 1.31 GB heap
10,000 × 256 KB kernel buffer = 2.5 GB kernel memory

Keep queues small (32–64 items) or use head-drop to prevent memory growth from lagging consumers. Fan-out through a shared Redis Pub/Sub Fan-Out for SSE broker lets you avoid per-connection queues entirely — the broker absorbs burst, and each connection reads from a shared stream.

CPU cost of token bucket refill Permalink to this section

Naïve token bucket implementations call Date.now() on every consume() call. At 50,000 emit attempts/s across all connections, this is 50,000 Date.now() calls/s — measurable but not bottleneck. The alternative is a refill-on-timer approach that refills all buckets centrally every 10–20 ms:

// Central refill — amortises clock reads across all connections
const buckets = new Map(); // clientId → TokenBucket
setInterval(() => {
  const elapsed = 0.02; // 20 ms
  for (const bucket of buckets.values()) {
    bucket.tokens = Math.min(bucket.capacity, bucket.tokens + elapsed * bucket.refillPerSecond);
  }
}, 20);

Async vs blocking rate limiters Permalink to this section

In an async runtime (Node.js event loop, asyncio, Go goroutines), rate limiters must not block the scheduler. Go’s rate.Limiter.Wait(ctx) parks the goroutine and yields — safe. Python’s asyncio.sleep() in the backoff path yields to the event loop — safe. A time.sleep() call in a Python sync route or a busy-wait loop in JavaScript will starve all other connections on the same thread.

Validation and Debugging Permalink to this section

curl-based verification Permalink to this section

# Verify rate limit is applied — count events over 5 seconds
curl -sN http://localhost:3000/events | \
  awk '/^data:/{count++} /^$/{if(count>0){print strftime("%H:%M:%S") " events:" count; count=0}}' &
sleep 5 && kill %1

# Expected: ~250 events (50/s × 5 s) if rate limit is enforced

# Simulate a slow consumer with rate-limited read
curl -sN http://localhost:3000/events | while IFS= read -r line; do
  echo "$line"
  sleep 0.1  # 10 events/s consumer, server emits 50/s
done

The slow consumer curl command lets you verify that drop log lines appear on the server after the buffer fills (usually within a few seconds).

DevTools Network tab Permalink to this section

Open Network → filter by text/event-stream.
Click the SSE endpoint request → EventStream tab.
Watch for gaps in id: sequence numbers — these reveal dropped events.
Check Timing → TTFB; a high TTFB with a fast server usually means proxy buffering is enabled.

Structured logging for drops Permalink to this section

Every drop event should emit a structured log line that dashboards can aggregate:

{
  "event": "sse_drop",
  "reason": "rate_limit",
  "client_ip": "10.0.0.5",
  "connection_id": "c-7f3a",
  "drops_total": 12,
  "bucket_tokens": 0.0,
  "timestamp": "2026-06-21T14:32:01.123Z"
}

Alert on drops_total > 100 per connection per minute as a proxy for misconfigured producers or misbehaving clients. A steady reason: tcp_backpressure from a specific IP is a slow-consumer problem; reason: rate_limit from all clients simultaneously suggests the rate limit is set too low for actual traffic.

Load testing with artillery Permalink to this section

# artillery-sse.yaml — simulate 200 concurrent SSE consumers
config:
  target: "http://localhost:3000"
  phases:
    - duration: 60
      arrivalRate: 10
      rampTo: 200

scenarios:
  - name: "SSE consumer"
    engine: http
    flow:
      - get:
          url: "/events"
          headers:
            Accept: "text/event-stream"
          # artillery closes after 30 s to simulate reconnections

artillery run artillery-sse.yaml --output results.json
artillery report results.json

Watch for P95/P99 response time inflation — a rising P99 under load indicates the rate limiter is correct but the queue is too small and connections are accumulating.

⚡ Production Directives

Set X-Accel-Buffering: no on every SSE response — without it, nginx silently queues events and your rate limiter sees no backpressure.
Always check the return value of res.write() (Node.js) or catch write deadline errors (Go) — ignoring them causes unbounded heap growth and eventual OOM.
Use a bounded per-connection queue (32–256 items) with an explicit drop policy; never allow an unbounded queue on a long-lived connection.
Emit a heartbeat SSE comment every 15–20 s to prevent firewalls and NAT gateways from terminating idle-appearing connections.
Log every dropped event with reason, client_ip, and drops_total; alert on drop rates exceeding 5% of emitted events per connection.

Production Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

What is the difference between rate limiting and backpressure in SSE?

Rate limiting is a server-side policy that caps how many events the server emits per second, regardless of what the client can absorb. Backpressure is a signal from the consumer (via TCP flow control) telling the server the client cannot absorb data fast enough. The two work together: rate limiting prevents overloading well-behaved clients; backpressure detects and handles slow or lagging clients. You need both — rate limiting alone does not protect against a consumer that suddenly stalls, and backpressure detection alone does not prevent a fast producer from overwhelming a slow-but-draining consumer.

Should I drop events or queue them when the rate limit is exceeded?

Depends on event semantics. For real-time telemetry (prices, sensor readings) where freshness matters more than completeness, use a small bounded queue with head-drop so the client always gets the most recent data. For transactional events (order updates, payment notifications) that must not be lost, queue them in a persistent store (Redis Streams, Kafka) keyed by Last-Event-ID so the client can replay after reconnecting. In-memory queues are appropriate only for bursty but recoverable situations — never rely on them as the sole delivery guarantee.

How do I detect that a slow consumer is causing backpressure rather than a network glitch?

Sustained backpressure from a slow consumer shows as: repeated drain events on the same connection (Node.js), Fprintf consistently blocking until the write deadline (Go), or growing queue depth for a single connection while other connections drain normally. Network glitches typically produce a single error followed by a client reconnect. Log the connection ID and IP with every backpressure event — if the same client appears repeatedly, it is a slow consumer, not a transient glitch. You can also measure the time between write() returning false and drain firing — under 100 ms is normal TCP jitter; consistently over 500 ms indicates a genuinely slow consumer.

Can I apply rate limiting globally (all clients) rather than per-connection?

Yes, but per-connection limiting is almost always preferable. A global rate limit means one fast connection starves all others. Use a global limit only to protect server-wide resources (CPU, database query rate) and combine it with per-connection limits to protect individual clients. In Node.js, you can share a single bucket across all connections and additionally maintain per-connection buckets. The global bucket enforces the server's total capacity; per-connection buckets enforce fair sharing.

What retry interval should I set when the server applies heavy rate limiting?

Set retry: 3000 (3 seconds) as a baseline. Under sustained rate limiting or when shedding load, increase the retry interval dynamically by emitting a new retry: directive before closing the connection: retry: 30000\n\n. This prevents thundering-herd reconnections when the server is under pressure. Combine with exponential backoff on the client side — the EventSource API does not do this by default, so implement it in a custom wrapper if needed. See Event ID & Retry Mechanism Design for the full retry design pattern.

Rate Limiting & Backpressure Handling #Permalink to this section

How Rate Limiting and Backpressure Work #Permalink to this section

The two-layer model #Permalink to this section

Token bucket algorithm #Permalink to this section

TCP backpressure signal #Permalink to this section

Server-Side Implementation: Node.js #Permalink to this section

Basic token bucket + drain pattern #Permalink to this section

Server-Side Implementation: Go #Permalink to this section

Per-connection queue with drop policy (Go) #Permalink to this section

Drop Policies #Permalink to this section

Python FastAPI Implementation #Permalink to this section

Edge Cases and Network Interference #Permalink to this section

Proxy buffering #Permalink to this section

OS socket buffer exhaustion #Permalink to this section

Firewall and idle-connection timeouts #Permalink to this section

Connection pool starvation #Permalink to this section

Performance and Scale Considerations #Permalink to this section

Memory budget per connection #Permalink to this section

CPU cost of token bucket refill #Permalink to this section

Async vs blocking rate limiters #Permalink to this section

Validation and Debugging #Permalink to this section

curl-based verification #Permalink to this section

DevTools Network tab #Permalink to this section

Structured logging for drops #Permalink to this section

Load testing with artillery #Permalink to this section

Production Checklist #Permalink to this section

Frequently Asked Questions #Permalink to this section

Related #Permalink to this section

Deep Dives