Configuring Connection Pools for High-Concurrency SSE Permalink to this section
Part of Connection Pooling for SSE Servers.
At a few hundred concurrent clients, default HTTP server settings hold up. Past roughly 1,000 persistent SSE streams per process, default OS file-descriptor ceilings, Node.js http.Agent socket caps, and proxy read timeouts all become hard walls. This guide walks you through the exact changes β OS, runtime, and proxy β needed to push a single server to tens of thousands of simultaneous streams without connection resets, silent drops, or memory blowouts.
Symptom & Developer Intent Permalink to this section
You are seeing one or more of the following:
- Clients receive
ERR_CONNECTION_RESETorECONNRESETafter a variable number of concurrent connections. - Server logs show
listen EMFILEorError: EMFILE too many open files. - Load tests plateau at a hard concurrency ceiling (often 1,024 or 4,096) regardless of CPU headroom.
- Memory grows linearly past the number of active sockets, suggesting buffered-but-undelivered event payloads.
- Nginx or HAProxy drops upstream connections with
upstream timed out (110: Connection timed out)after 60β90 s.
The intent: sustain 10,000β100,000 simultaneous text/event-stream connections per server node with stable latency, predictable memory usage, and clean reconnection semantics when you deploy or scale down.
Root Cause Analysis Permalink to this section
SSE streams violate every assumption baked into default HTTP infrastructure:
Default OS limits. Linux ships with fs.file-max set to 1,048,576 but per-process nofile soft limits of 1,024 (Ubuntu/Debian) or 4,096 (RHEL/Amazon Linux). Each SSE connection holds one file descriptor for the TCP socket and, under Node.js, one additional internal handle. At 1,024 simultaneous clients you hit EMFILE before the CPU reaches 5%.
Default HTTP agent caps. Node.js http.Agent defaults to maxSockets: Infinity for inbound server sockets (not directly applicable) but caps outbound connections to upstreams at 5 per host. When your SSE server acts as a fan-out proxy β pulling a source stream and distributing it to browser clients β these 5-socket caps stall immediately.
Keep-alive misalignment. Node.js 18+ defaults keepAliveTimeout to 5 s. If a proxy sits in front with a 60 s idle timeout, the proxy closes the TCP connection first, before Node drains it. The result is an asymmetric FIN race that manifests as ECONNRESET on the client.
Proxy buffering. Nginx buffers upstream responses into proxy_buffer_size-aligned memory before forwarding to the client. For SSE, this means the first 8 KB of events sit in Nginxβs buffer, invisible to the browser, until the buffer fills. The client sees nothing, assumes the connection is dead, and reconnects β creating a storm of reconnections under load.
No backpressure propagation. res.write() in Node.js returns false when the kernel send buffer is full. Ignoring this causes data to pile up in the V8 heap, producing linear memory growth and eventual OOM kills. The rate limiting and backpressure guide covers the event-level controls; here we address the underlying socket-pool configuration.
Step-by-Step Resolution Permalink to this section
Step 1 β Raise OS File-Descriptor Limits Permalink to this section
Calculate your target: each SSE connection needs 2 FDs (socket + epoll watch). Add 100 for the process itself. Round up to the next power of two.
# 1a. Check current per-process limit
ulimit -n
# 1b. Raise the system-wide maximum (persistent across reboots via sysctl.conf)
sudo sysctl -w fs.file-max=2097152
echo "fs.file-max=2097152" | sudo tee -a /etc/sysctl.conf
# 1c. Raise per-user/service limits (edit /etc/security/limits.conf)
echo "sse-service soft nofile 131072" | sudo tee -a /etc/security/limits.conf
echo "sse-service hard nofile 262144" | sudo tee -a /etc/security/limits.conf
# 1d. If running under systemd, override the unit file instead:
# [Service]
# LimitNOFILE=262144
Verify after restart:
cat /proc/$(pgrep -f "node server")/limits | grep "open files"
# Max open files 131072 262144 files
Also raise the TCP listen backlog so burst arrivals do not drop before accept:
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535
Step 2 β Configure Node.js HTTP Agent and Server Timeouts Permalink to this section
For inbound SSE server connections, the critical knobs are on http.Server. For outbound fan-out connections (Node proxying to an upstream event source), configure http.Agent explicitly.
const http = require('http');
const crypto = require('crypto');
// --- Outbound agent for upstream event-source connections ---
const upstreamAgent = new http.Agent({
keepAlive: true,
maxSockets: 500, // Max parallel upstream connections per hostname
maxFreeSockets: 20, // Keep 20 idle sockets warm in the pool
timeout: 0, // No agent-level socket timeout β stream lives until client leaves
scheduling: 'fifo', // Predictable queue ordering under saturation
});
// --- Inbound server ---
const server = http.createServer(app);
// keepAliveTimeout must be LONGER than the upstream proxy idle timeout.
// If Nginx proxy_read_timeout = 3600s, set this above 3600s or to 0.
server.keepAliveTimeout = 65_000; // 65 s β outlasts the common 60 s proxy idle
server.headersTimeout = 70_000; // Must exceed keepAliveTimeout
server.requestTimeout = 0; // Disable per-request timeout; SSE has no fixed end
// Maximum number of simultaneous connections before Node queues/rejects new ones.
// Set to your (FD limit β 200) to leave headroom for logs and metrics sockets.
server.maxConnections = 130_000;
server.listen(3000, () => console.log('SSE server ready'));
| Parameter | Default (Node 20) | Recommended (10k+ streams) |
|---|---|---|
keepAliveTimeout |
5,000 ms | 65,000 ms |
headersTimeout |
60,000 ms | 70,000 ms |
requestTimeout |
300,000 ms | 0 (disabled) |
maxConnections |
Infinity |
FD limit β 200 |
Agent maxSockets |
Infinity |
Size to upstream capacity |
Agent timeout |
socket idle timeout | 0 for streams |
Step 3 β Configure Nginx as an SSE Reverse Proxy Permalink to this section
Nginx sits in front of almost every production Node deployment. Its defaults buffer upstream data and impose a 60 s proxy_read_timeout β both fatal for SSE.
upstream sse_backend {
server 127.0.0.1:3000;
keepalive 200; # Pool of 200 idle keepalive connections to upstream
keepalive_timeout 65s; # Match server.keepAliveTimeout above
keepalive_requests 100000;
}
server {
listen 443 ssl http2;
location /api/events {
proxy_pass http://sse_backend;
# Use HTTP/1.1 to the upstream so chunked transfer encoding works
proxy_http_version 1.1;
# Clear the Connection header so the upstream sees a persistent connection
proxy_set_header Connection "";
# Mandatory: disable response buffering so events flow immediately
proxy_buffering off;
proxy_cache off;
# Extend timeouts to cover the expected stream lifetime
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
proxy_connect_timeout 10s;
# Required SSE headers
proxy_set_header X-Accel-Buffering no;
add_header Cache-Control "no-cache";
add_header X-Accel-Buffering "no";
}
}
The keepalive 200 directive in the upstream block maintains a pool of reusable TCP connections from Nginx to your Node process, reducing the FD churn caused by per-request connect/close cycles under high load.
Step 4 β Implement Connection Registry with Lifecycle Hooks Permalink to this section
Track every open SSE response object in a server-side registry. This gives you accurate concurrency metrics, controlled shutdown, and per-client backpressure without a third-party library.
const activeStreams = new Map(); // clientId β res
function sseHandler(req, res) {
const clientId = crypto.randomUUID();
// Enforce a hard cap before accepting the connection
if (activeStreams.size >= 50_000) {
res.writeHead(503, { 'Retry-After': '10' });
res.end('data: {"error":"server_at_capacity"}\n\n');
return;
}
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'X-Accel-Buffering': 'no', // Signals Nginx to skip buffering
});
res.flushHeaders(); // Flush HTTP 200 immediately so the client doesn't wait
// Send an initial comment to establish the stream and prevent proxy buffering
res.write(':ok\n\n');
activeStreams.set(clientId, res);
// Clean up when the client disconnects (browser close, navigation, network drop)
req.on('close', () => {
activeStreams.delete(clientId);
// res.end() is safe to call multiple times; Node deduplicates it
res.end();
});
}
// Graceful shutdown: notify clients before the process exits
process.once('SIGTERM', () => {
const shutdown = `event: server-shutdown\ndata: {"reconnect":true}\n\n`;
for (const [id, res] of activeStreams) {
res.write(shutdown);
res.end();
activeStreams.delete(id);
}
server.close(() => process.exit(0));
});
The idempotent event ID guide covers how to attach stable id: fields so reconnecting clients resume without replaying events they already received.
Step 5 β Apply Write Backpressure Per Connection Permalink to this section
When a clientβs TCP buffer fills, res.write() returns false. Unchecked writes pile payload strings in the Node.js heap. Pause the upstream source until the socket drains.
function writeEvent(res, eventSource, payload) {
const chunk = `data: ${JSON.stringify(payload)}\n\n`;
const drained = res.write(chunk);
if (!drained) {
// Pause the event emitter until this response socket drains
eventSource.pause();
res.once('drain', () => {
eventSource.resume();
});
}
}
For a Redis pub/sub fan-out pattern β where one Redis channel feeds many SSE clients β pause the Redis subscriber when any clientβs socket is full, or use a per-client queue with a bounded size and drop policy:
const MAX_QUEUE = 50; // Drop oldest when queue exceeds this depth
class SSEClient {
constructor(res) {
this.res = res;
this.queue = [];
this.draining = false;
}
push(chunk) {
if (this.queue.length >= MAX_QUEUE) {
this.queue.shift(); // Drop oldest, keep newest
}
this.queue.push(chunk);
this._drain();
}
_drain() {
if (this.draining || this.queue.length === 0) return;
this.draining = true;
const write = () => {
while (this.queue.length > 0) {
const ok = this.res.write(this.queue[0]);
if (!ok) {
this.res.once('drain', write);
return;
}
this.queue.shift();
}
this.draining = false;
};
write();
}
}
Validation & Monitoring Permalink to this section
Verify Headers and Stream Delivery Permalink to this section
# Check response headers β all four must be present
curl -sI -N https://your-api.com/api/events \
| grep -iE '(cache-control|connection|content-type|transfer-encoding|x-accel-buffering)'
# Expected output:
# Content-Type: text/event-stream
# Cache-Control: no-cache
# Connection: keep-alive
# Transfer-Encoding: chunked
# X-Accel-Buffering: no
# Receive a live stream and time the first byte (TTFB should be <200 ms)
curl -N --max-time 30 -w "\nTTFB: %{time_starttransfer}s\n" \
-H "Accept: text/event-stream" https://your-api.com/api/events
Instrument with Prometheus Metrics Permalink to this section
const client = require('prom-client');
client.collectDefaultMetrics(); // Includes event loop lag, heap, FD count
const activeConnsGauge = new client.Gauge({
name: 'sse_active_connections_total',
help: 'Number of currently open SSE connections',
});
const disconnectCounter = new client.Counter({
name: 'sse_disconnects_total',
help: 'Total SSE client disconnections',
});
// Update in handler
setInterval(() => activeConnsGauge.set(activeStreams.size), 1000);
req.on('close', () => disconnectCounter.inc());
Alert on:
sse_active_connections_totalapproachingserver.maxConnectionsprocess_open_fdsexceeding 80% of the hardnofilelimitnodejs_eventloop_lag_secondsp99 > 100 ms (indicates CPU saturation, not pool exhaustion)
Load Test with k6 Permalink to this section
// k6 load test: 10,000 concurrent SSE streams for 5 minutes
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 10000,
duration: '5m',
thresholds: {
http_req_duration: ['p(99)<200'], // TTFB under 200 ms at p99
http_req_failed: ['rate<0.001'], // Less than 0.1% errors
http_reqs: ['rate>500'], // Sustained throughput
},
};
export default function () {
const res = http.get('https://your-api.com/api/events', {
headers: { Accept: 'text/event-stream' },
timeout: '300s',
responseType: 'none', // Do not buffer response body in k6
});
check(res, { 'status 200': (r) => r.status === 200 });
sleep(60); // Hold connection open for 60 s per VU
}
Watch for EMFILE errors in server logs during the ramp-up. If they appear, the nofile limit from Step 1 is not yet applied to the running process.
Verification Checklist Permalink to this section
Frequently Asked Questions Permalink to this section
How many SSE connections can a single Node.js process handle?
In practice, 20,000β80,000 simultaneous streams per process on modern hardware (4 vCPU, 8 GB RAM). The binding constraints are OS file descriptors, the Node.js single-threaded event loop, and memory β roughly 4β8 KB of heap per idle SSE response. CPU only becomes a factor when event throughput is high (thousands of events per second across all connections). Horizontal scaling with sticky sessions or a Redis pub/sub fan-out is typically more operationally predictable than pushing a single process to its absolute ceiling.
Why does my proxy close SSE connections after exactly 60 seconds?
Nginx defaults to proxy_read_timeout 60s and HAProxy defaults to timeout tunnel 0 but timeout server 1m. When no data flows on the upstream socket for that interval, the proxy sends a FIN. Fix this by: (1) setting proxy_read_timeout 3600s in Nginx, and (2) sending a SSE comment heartbeat (:heartbeat\n\n) every 20β30 s so the proxy sees activity. The heartbeat is invisible to the EventSource API but resets the proxy idle timer.
Should I use HTTP/2 for SSE to reduce connection overhead?
HTTP/2 multiplexes many streams over one TCP connection, but the browser EventSource API does not use HTTP/2 β it always opens an HTTP/1.1 connection. The fetch-based SSE pattern (using ReadableStream) does use HTTP/2 when available, which reduces the FD pressure per client. However, HTTP/2 multiplexing shifts head-of-line blocking from TCP (HTTP/1.1) to the application layer when streams interfere. For SSE specifically, HTTP/1.1 with properly tuned keep-alive is simpler and well-supported. See SSE vs WebSockets vs HTTP Polling for a fuller protocol comparison.
What is the right keepAliveTimeout value for Node.js behind Nginx?
Set server.keepAliveTimeout to your Nginx proxy_read_timeout value plus 5 seconds. If Nginx closes the upstream connection after 3,600 s of inactivity, set Node's keepAliveTimeout to 3,605,000 ms. This ensures Node never closes the TCP connection from its side before Nginx does, eliminating the asymmetric-FIN race that causes ECONNRESET. Always set server.headersTimeout 5β10 s above keepAliveTimeout.
How do I handle a Redis pub/sub subscriber that feeds thousands of SSE clients?
Subscribe once per channel per Node.js process, not once per SSE client. Maintain a Map<channel, Set<SSEClient>> registry. When the Redis message arrives, iterate the set and write to each client. Apply the per-client bounded queue from Step 5 so a slow client cannot block delivery to fast clients. See Redis Pub/Sub Fan-Out for SSE for the full architecture.
β‘ Production Directives
- Set
LimitNOFILE=262144in your systemd unit file β runtimeulimit -nchanges do not survive service restarts. - Send a SSE comment heartbeat (
:ping\n\n) every 25 s to reset proxy idle timers and confirm the client is still alive without triggeringEventSourcereconnection. - Set
server.requestTimeout = 0andserver.keepAliveTimeoutabove your longest-lived proxy timeout, or you will see silent drops with no error logs on either side. - Gate new SSE connections with a concurrency check against
server.maxConnections; return HTTP 503 withRetry-Afterrather than silently queuing connections until OOM. - Alert when
process_open_fdsexceeds 80% of the hardnofilelimit β you want time to scale out before the process crashes withEMFILE.