Rate Limiting & Backpressure Handling Permalink to this section
Part of Backend Stream Generation & Connection Management.
In long-lived SSE connections, two distinct failure modes lurk on opposite ends of the pipe. On the producer side, unbounded event generation floods the TCP send buffer and eventually exhausts server memory when thousands of connections accumulate unconsumed data. On the consumer side, slow clients that cannot process events as fast as the server emits them create slow-consumer pressure that ripples back through the network stack. This guide covers both: rate limiting (capping how fast the server produces events per connection) and backpressure (detecting when a consumer is falling behind and responding without crashing the server). The practical outcomes are predictable P99 latency, bounded memory usage, and graceful degradation instead of silent data loss.
How Rate Limiting and Backpressure Work Permalink to this section
The two-layer model Permalink to this section
Rate limiting and backpressure operate at different layers and serve different goals:
| Layer | Mechanism | Goal | Where it lives |
|---|---|---|---|
| Application rate limit | Token bucket / sliding window | Cap events emitted per connection per second | Middleware before res.write() |
| TCP write backpressure | EAGAIN / drain event |
Stall producer when OS send buffer is full | OS socket layer |
| Queue backpressure | Bounded in-memory queue | Absorb bursts without dropping | Between dispatcher and writer |
| Drop policy | Tail drop / head drop / random | Shed load when all buffers saturate | Last resort before OOM |
The SSE wire format specifies text/event-stream with \n\n-terminated fields. There is no built-in flow-control field in the protocol — the server must implement all throttling outside the event format itself. The retry: directive controls the client reconnection interval, not server emission rate.
Token bucket algorithm Permalink to this section
A token bucket maintains a counter of available tokens. Each event costs one token. Tokens refill at a fixed rate up to a capacity cap. When the bucket is empty, the event is either queued or dropped.
capacity = 100 tokens
refill = 50 tokens/second
cost = 1 token per event
On emit attempt:
if tokens >= cost:
tokens -= cost
emit event
else:
drop or enqueue event
The burst capacity (= capacity) lets a connection absorb momentary traffic spikes without shedding events, while the refill rate enforces a long-run throughput ceiling. See Applying Token-Bucket Rate Limiting to Event Streams for per-language implementations with accurate refill timing.
TCP backpressure signal Permalink to this section
res.write() in Node.js (and equivalent calls in Go/Python) returns false when the underlying TCP send buffer is full. This is the OS signalling that the consumer is not draining data fast enough. The correct response is to pause the upstream producer immediately and resume only when the drain event fires.
Ignoring the return value of write() is the most common cause of unbounded memory growth in SSE servers — the application keeps enqueueing chunks that the OS cannot transmit, and heap memory climbs until the process OOMs or the connection resets.
Server-Side Implementation: Node.js Permalink to this section
Basic token bucket + drain pattern Permalink to this section
// sse-rate-limiter.js — Node.js HTTP SSE with token bucket + backpressure
import http from 'node:http';
class TokenBucket {
constructor({ capacity = 100, refillPerSecond = 50 } = {}) {
this.capacity = capacity;
this.tokens = capacity;
this.refillPerSecond = refillPerSecond;
this.lastRefill = Date.now();
}
consume(cost = 1) {
this._refill();
if (this.tokens < cost) return false; // bucket empty — deny
this.tokens -= cost;
return true;
}
_refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000; // seconds
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillPerSecond);
this.lastRefill = now;
}
}
const server = http.createServer((req, res) => {
if (req.url !== '/events') { res.end(); return; }
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'X-Accel-Buffering': 'no', // disable nginx proxy buffering
});
// Tell the client to wait 3 s before reconnecting if the stream closes
res.write('retry: 3000\n\n');
const bucket = new TokenBucket({ capacity: 100, refillPerSecond: 50 });
let paused = false;
let dropsThisConnection = 0;
const emit = (eventData) => {
if (paused) {
// Producer is paused waiting for TCP drain — drop this event
dropsThisConnection++;
console.warn({ event: 'drop', reason: 'tcp_backpressure', drops: dropsThisConnection });
return;
}
if (!bucket.consume()) {
// Token bucket empty — rate limit exceeded
dropsThisConnection++;
console.warn({ event: 'drop', reason: 'rate_limit', drops: dropsThisConnection });
return;
}
const chunk = `data: ${JSON.stringify(eventData)}\n\n`;
const ok = res.write(chunk);
if (!ok) {
// TCP send buffer full — pause upstream until drain
paused = true;
res.once('drain', () => { paused = false; });
}
};
// Simulated producer — replace with your actual event source
const interval = setInterval(() => emit({ ts: Date.now(), value: Math.random() }), 10);
req.on('close', () => {
clearInterval(interval);
console.info({ event: 'client_disconnected', drops: dropsThisConnection });
});
});
server.listen(3000);
Key points:
X-Accel-Buffering: notells nginx to pass chunks directly to the client instead of buffering them at the proxy layer.res.write()returnsfalsewhen the kernel send buffer is full;drainfires when it empties.- Both the token bucket and backpressure signal log structured drop events — these are the primary observability hook.
For a complete disconnect-handling pattern that cleans up intervals and removes the connection from a global registry, see Handling Client Disconnects in Node.js SSE.
Server-Side Implementation: Go Permalink to this section
Go’s http.Flusher interface and channels give you explicit backpressure without event-loop callbacks.
// sse_ratelimit.go — Go SSE with token bucket and channel-based backpressure
package main
import (
"context"
"encoding/json"
"fmt"
"log/slog"
"math/rand"
"net/http"
"time"
"golang.org/x/time/rate" // go get golang.org/x/time/rate
)
type Event struct {
TS int64 `json:"ts"`
Value float64 `json:"value"`
}
func sseHandler(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "streaming unsupported", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
w.Header().Set("X-Accel-Buffering", "no")
fmt.Fprintf(w, "retry: 3000\n\n")
flusher.Flush()
// 50 events/s sustained, burst of 100
limiter := rate.NewLimiter(rate.Limit(50), 100)
ctx := r.Context()
drops := 0
for {
select {
case <-ctx.Done():
slog.Info("client disconnected", "drops", drops)
return
default:
}
// Wait for a token (blocks until one is available or context cancels)
if err := limiter.Wait(ctx); err != nil {
slog.Info("limiter wait cancelled", "err", err)
return
}
ev := Event{TS: time.Now().UnixMilli(), Value: rand.Float64()}
payload, _ := json.Marshal(ev)
// Write is synchronous in Go's net/http — if the client TCP buffer is full,
// Write blocks. Set a write deadline to detect slow consumers.
if deadline, ok := ctx.Deadline(); ok {
w.(http.ResponseWriter).(interface{ SetWriteDeadline(time.Time) error }).
SetWriteDeadline(deadline)
}
// Use a per-write deadline to detect stalled consumers
rc := http.NewResponseController(w)
rc.SetWriteDeadline(time.Now().Add(5 * time.Second))
if _, err := fmt.Fprintf(w, "data: %s\n\n", payload); err != nil {
slog.Warn("write error — slow consumer or disconnect", "err", err, "drops", drops)
return
}
flusher.Flush()
}
}
func main() {
http.HandleFunc("/events", sseHandler)
slog.Info("listening on :3000")
http.ListenAndServe(":3000", nil)
}
http.NewResponseController (Go 1.20+) lets you call SetWriteDeadline on any ResponseWriter without a type assertion. If Fprintf blocks beyond the deadline, it returns an error and you close the connection — this is the idiomatic Go backpressure response. See Go Streaming Patterns for SSE for the full channel fan-out pattern using http.Flusher.
Per-connection queue with drop policy (Go) Permalink to this section
When you want to absorb short bursts rather than stalling the rate limiter, use a bounded channel as a per-connection queue:
// Bounded event queue — drop newest events when full (tail-drop policy)
queue := make(chan []byte, 256) // 256-event buffer per connection
// Producer goroutine
go func() {
for ev := range eventSource {
payload, _ := json.Marshal(ev)
select {
case queue <- payload:
// accepted
default:
// queue full — tail drop
slog.Warn("event dropped", "policy", "tail_drop")
}
}
}()
// Writer goroutine
for {
select {
case <-ctx.Done():
return
case payload := <-queue:
rc.SetWriteDeadline(time.Now().Add(5 * time.Second))
fmt.Fprintf(w, "data: %s\n\n", payload)
flusher.Flush()
}
}
A channel of size 256 limits worst-case memory per connection to roughly 256 × avgEventSize. At 1 KB/event that is 256 KB — multiply by connection count to budget total memory.
Drop Policies Permalink to this section
When both the rate limiter and the queue are saturated, you must decide which events to discard:
| Policy | Mechanism | Best for | Drawback |
|---|---|---|---|
| Tail drop | Discard the newest event when queue full | Simplest; preserves sequence order | Newest data lost during saturation |
| Head drop | Dequeue oldest, enqueue newest | Real-time telemetry where freshness > completeness | Out-of-order gaps; stale events silently lost |
| Random drop | Drop random entry | Fairer under sustained load | Unpredictable client-side gaps |
| Selective drop | Drop events with lower priority field |
Business-logic aware throttling | Requires event schema knowledge |
For SSE systems using Idempotent Event ID Generation, tail-drop is safest — the client’s Last-Event-ID reconnection will request the last ID it saw, and the server can replay from a persistent store. Head-drop breaks this guarantee because the oldest dropped event may be exactly what the client needs on reconnect.
Python FastAPI Implementation Permalink to this section
# sse_ratelimit.py — FastAPI SSE with token bucket backpressure
import asyncio
import json
import time
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
app = FastAPI()
class AsyncTokenBucket:
def __init__(self, capacity: int = 100, refill_per_second: float = 50.0):
self.capacity = capacity
self.tokens = float(capacity)
self.refill_per_second = refill_per_second
self._last_refill = time.monotonic()
self._lock = asyncio.Lock()
async def consume(self, cost: float = 1.0) -> bool:
async with self._lock:
now = time.monotonic()
elapsed = now - self._last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_per_second)
self._last_refill = now
if self.tokens >= cost:
self.tokens -= cost
return True
return False
async def event_generator(request: Request):
bucket = AsyncTokenBucket(capacity=100, refill_per_second=50)
drops = 0
# Send retry directive as first frame
yield "retry: 3000\n\n"
while True:
if await request.is_disconnected():
print(f"client disconnected, drops={drops}")
break
allowed = await bucket.consume()
if not allowed:
drops += 1
# Back off briefly — do not spin the event loop
await asyncio.sleep(0.02)
continue
payload = json.dumps({"ts": int(time.time() * 1000), "v": __import__('random').random()})
yield f"data: {payload}\n\n"
# Yield control so other connections are not starved
await asyncio.sleep(0)
@app.get("/events")
async def sse_endpoint(request: Request):
return StreamingResponse(
event_generator(request),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
},
)
request.is_disconnected() is an async check that polls the ASGI connection state — it returns True when the client has closed the TCP connection. This prevents the generator from running indefinitely for ghost connections. For full sse-starlette integration, see Python FastAPI SSE Implementation Guide.
Edge Cases and Network Interference Permalink to this section
Proxy buffering Permalink to this section
The most dangerous invisible backpressure source is the reverse proxy buffer. nginx, Apache, and AWS ALB all buffer responses by default. Events pile up in the proxy without ever reaching the client, making the server believe the consumer is alive and draining.
| Proxy | Disable buffering | Notes |
|---|---|---|
| nginx | proxy_buffering off; + X-Accel-Buffering: no header |
Header overrides per-location config |
| Apache mod_proxy | SetEnv proxy-sendchunked 1 |
Also needs ProxyBadHeader Ignore |
| AWS ALB | Enable chunked_encoding attribute |
Cannot disable buffering entirely — keep events < 1 MB |
| Cloudflare | Set Transfer-Encoding: chunked; Polish/Rocket Loader off |
Paid plans can use streaming responses |
| Varnish | beresp.do_stream = true; in VCL |
Also set beresp.uncacheable = true; |
Always send X-Accel-Buffering: no from the application layer — it propagates through most proxy chains and is the single highest-impact change for reducing proxy-induced latency.
OS socket buffer exhaustion Permalink to this section
The kernel TCP send buffer defaults to 128–256 KB depending on the OS. Under Buffer Management & Chunked Transfer Encoding misconfigurations, a slow client will fill this buffer and cause EAGAIN errors (non-blocking socket) or stall write() calls (blocking socket). Node.js surfaces this as res.write() returning false; Go surfaces it as Fprintf blocking until the write deadline expires.
Tune kernel parameters if you run many concurrent SSE connections:
# /etc/sysctl.d/99-sse.conf
net.core.wmem_max = 4194304 # 4 MB max send buffer
net.core.wmem_default = 262144 # 256 KB default send buffer
net.ipv4.tcp_wmem = 4096 65536 4194304
Firewall and idle-connection timeouts Permalink to this section
Stateful firewalls and NAT gateways terminate TCP connections they consider idle — typically after 30–300 s with no data in flight. An SSE connection that rate-limits down to 0 events/s appears idle to the firewall.
Mitigate with a heartbeat comment:
// Send a no-op SSE comment every 20 s to keep the connection alive
const heartbeat = setInterval(() => {
const ok = res.write(': heartbeat\n\n');
if (!ok) {
// Heartbeat itself encountered backpressure — connection is dead
clearInterval(heartbeat);
res.end();
}
}, 20_000);
Comments (lines starting with :) are legal SSE frames and are silently ignored by the EventSource API on the client.
Connection pool starvation Permalink to this section
Under aggressive rate limiting, connections can stay open for long periods with low event throughput. This is healthy for individual clients but dangerous for the server’s Connection Pooling for SSE Servers strategy — file descriptors and memory are consumed by idle-but-alive connections. Set a maximum connection age and close the stream gracefully:
// Force reconnect after 10 minutes — client will auto-reconnect via EventSource
const maxAge = setTimeout(() => {
res.write('event: reconnect\ndata: {}\n\n');
res.end();
}, 10 * 60 * 1000);
req.on('close', () => clearTimeout(maxAge));
Performance and Scale Considerations Permalink to this section
Memory budget per connection Permalink to this section
Each SSE connection holds at minimum:
- A per-connection token bucket state: ~40 bytes
- A bounded event queue (if used):
queueDepth × avgEventBytes - The OS TCP send buffer: 128–256 KB kernel memory (not heap)
- Node.js socket wrapping overhead: ~4–8 KB heap
At 10,000 concurrent connections with a 256-item × 512-byte queue:
10,000 × (256 × 512) = 1.31 GB heap
10,000 × 256 KB kernel buffer = 2.5 GB kernel memory
Keep queues small (32–64 items) or use head-drop to prevent memory growth from lagging consumers. Fan-out through a shared Redis Pub/Sub Fan-Out for SSE broker lets you avoid per-connection queues entirely — the broker absorbs burst, and each connection reads from a shared stream.
CPU cost of token bucket refill Permalink to this section
Naïve token bucket implementations call Date.now() on every consume() call. At 50,000 emit attempts/s across all connections, this is 50,000 Date.now() calls/s — measurable but not bottleneck. The alternative is a refill-on-timer approach that refills all buckets centrally every 10–20 ms:
// Central refill — amortises clock reads across all connections
const buckets = new Map(); // clientId → TokenBucket
setInterval(() => {
const elapsed = 0.02; // 20 ms
for (const bucket of buckets.values()) {
bucket.tokens = Math.min(bucket.capacity, bucket.tokens + elapsed * bucket.refillPerSecond);
}
}, 20);
Async vs blocking rate limiters Permalink to this section
In an async runtime (Node.js event loop, asyncio, Go goroutines), rate limiters must not block the scheduler. Go’s rate.Limiter.Wait(ctx) parks the goroutine and yields — safe. Python’s asyncio.sleep() in the backoff path yields to the event loop — safe. A time.sleep() call in a Python sync route or a busy-wait loop in JavaScript will starve all other connections on the same thread.
Validation and Debugging Permalink to this section
curl-based verification Permalink to this section
# Verify rate limit is applied — count events over 5 seconds
curl -sN http://localhost:3000/events | \
awk '/^data:/{count++} /^$/{if(count>0){print strftime("%H:%M:%S") " events:" count; count=0}}' &
sleep 5 && kill %1
# Expected: ~250 events (50/s × 5 s) if rate limit is enforced
# Simulate a slow consumer with rate-limited read
curl -sN http://localhost:3000/events | while IFS= read -r line; do
echo "$line"
sleep 0.1 # 10 events/s consumer, server emits 50/s
done
The slow consumer curl command lets you verify that drop log lines appear on the server after the buffer fills (usually within a few seconds).
DevTools Network tab Permalink to this section
- Open Network → filter by
text/event-stream. - Click the SSE endpoint request → EventStream tab.
- Watch for gaps in
id:sequence numbers — these reveal dropped events. - Check Timing → TTFB; a high TTFB with a fast server usually means proxy buffering is enabled.
Structured logging for drops Permalink to this section
Every drop event should emit a structured log line that dashboards can aggregate:
{
"event": "sse_drop",
"reason": "rate_limit",
"client_ip": "10.0.0.5",
"connection_id": "c-7f3a",
"drops_total": 12,
"bucket_tokens": 0.0,
"timestamp": "2026-06-21T14:32:01.123Z"
}
Alert on drops_total > 100 per connection per minute as a proxy for misconfigured producers or misbehaving clients. A steady reason: tcp_backpressure from a specific IP is a slow-consumer problem; reason: rate_limit from all clients simultaneously suggests the rate limit is set too low for actual traffic.
Load testing with artillery Permalink to this section
# artillery-sse.yaml — simulate 200 concurrent SSE consumers
config:
target: "http://localhost:3000"
phases:
- duration: 60
arrivalRate: 10
rampTo: 200
scenarios:
- name: "SSE consumer"
engine: http
flow:
- get:
url: "/events"
headers:
Accept: "text/event-stream"
# artillery closes after 30 s to simulate reconnections
artillery run artillery-sse.yaml --output results.json
artillery report results.json
Watch for P95/P99 response time inflation — a rising P99 under load indicates the rate limiter is correct but the queue is too small and connections are accumulating.
⚡ Production Directives
- Set
X-Accel-Buffering: noon every SSE response — without it, nginx silently queues events and your rate limiter sees no backpressure. - Always check the return value of
res.write()(Node.js) or catch write deadline errors (Go) — ignoring them causes unbounded heap growth and eventual OOM. - Use a bounded per-connection queue (32–256 items) with an explicit drop policy; never allow an unbounded queue on a long-lived connection.
- Emit a heartbeat SSE comment every 15–20 s to prevent firewalls and NAT gateways from terminating idle-appearing connections.
- Log every dropped event with
reason,client_ip, anddrops_total; alert on drop rates exceeding 5% of emitted events per connection.
Production Checklist Permalink to this section
Frequently Asked Questions Permalink to this section
What is the difference between rate limiting and backpressure in SSE?
Rate limiting is a server-side policy that caps how many events the server emits per second, regardless of what the client can absorb. Backpressure is a signal from the consumer (via TCP flow control) telling the server the client cannot absorb data fast enough. The two work together: rate limiting prevents overloading well-behaved clients; backpressure detects and handles slow or lagging clients. You need both — rate limiting alone does not protect against a consumer that suddenly stalls, and backpressure detection alone does not prevent a fast producer from overwhelming a slow-but-draining consumer.
Should I drop events or queue them when the rate limit is exceeded?
Depends on event semantics. For real-time telemetry (prices, sensor readings) where freshness matters more than completeness, use a small bounded queue with head-drop so the client always gets the most recent data. For transactional events (order updates, payment notifications) that must not be lost, queue them in a persistent store (Redis Streams, Kafka) keyed by Last-Event-ID so the client can replay after reconnecting. In-memory queues are appropriate only for bursty but recoverable situations — never rely on them as the sole delivery guarantee.
How do I detect that a slow consumer is causing backpressure rather than a network glitch?
Sustained backpressure from a slow consumer shows as: repeated drain events on the same connection (Node.js), Fprintf consistently blocking until the write deadline (Go), or growing queue depth for a single connection while other connections drain normally. Network glitches typically produce a single error followed by a client reconnect. Log the connection ID and IP with every backpressure event — if the same client appears repeatedly, it is a slow consumer, not a transient glitch. You can also measure the time between write() returning false and drain firing — under 100 ms is normal TCP jitter; consistently over 500 ms indicates a genuinely slow consumer.
Can I apply rate limiting globally (all clients) rather than per-connection?
Yes, but per-connection limiting is almost always preferable. A global rate limit means one fast connection starves all others. Use a global limit only to protect server-wide resources (CPU, database query rate) and combine it with per-connection limits to protect individual clients. In Node.js, you can share a single bucket across all connections and additionally maintain per-connection buckets. The global bucket enforces the server's total capacity; per-connection buckets enforce fair sharing.
What retry interval should I set when the server applies heavy rate limiting?
Set retry: 3000 (3 seconds) as a baseline. Under sustained rate limiting or when shedding load, increase the retry interval dynamically by emitting a new retry: directive before closing the connection: retry: 30000\n\n. This prevents thundering-herd reconnections when the server is under pressure. Combine with exponential backoff on the client side — the EventSource API does not do this by default, so implement it in a custom wrapper if needed. See Event ID & Retry Mechanism Design for the full retry design pattern.