Configuring Connection Pools for High-Concurrency SSE Permalink to this section

Part of Connection Pooling for SSE Servers.

At a few hundred concurrent clients, default HTTP server settings hold up. Past roughly 1,000 persistent SSE streams per process, default OS file-descriptor ceilings, Node.js http.Agent socket caps, and proxy read timeouts all become hard walls. This guide walks you through the exact changes — OS, runtime, and proxy — needed to push a single server to tens of thousands of simultaneous streams without connection resets, silent drops, or memory blowouts.

Symptom & Developer Intent Permalink to this section

You are seeing one or more of the following:

Clients receive ERR_CONNECTION_RESET or ECONNRESET after a variable number of concurrent connections.
Server logs show listen EMFILE or Error: EMFILE too many open files.
Load tests plateau at a hard concurrency ceiling (often 1,024 or 4,096) regardless of CPU headroom.
Memory grows linearly past the number of active sockets, suggesting buffered-but-undelivered event payloads.
Nginx or HAProxy drops upstream connections with upstream timed out (110: Connection timed out) after 60–90 s.

The intent: sustain 10,000–100,000 simultaneous text/event-stream connections per server node with stable latency, predictable memory usage, and clean reconnection semantics when you deploy or scale down.

Root Cause Analysis Permalink to this section

SSE streams violate every assumption baked into default HTTP infrastructure:

Default OS limits. Linux ships with fs.file-max set to 1,048,576 but per-process nofile soft limits of 1,024 (Ubuntu/Debian) or 4,096 (RHEL/Amazon Linux). Each SSE connection holds one file descriptor for the TCP socket and, under Node.js, one additional internal handle. At 1,024 simultaneous clients you hit EMFILE before the CPU reaches 5%.

Default HTTP agent caps. Node.js http.Agent defaults to maxSockets: Infinity for inbound server sockets (not directly applicable) but caps outbound connections to upstreams at 5 per host. When your SSE server acts as a fan-out proxy — pulling a source stream and distributing it to browser clients — these 5-socket caps stall immediately.

Keep-alive misalignment. Node.js 18+ defaults keepAliveTimeout to 5 s. If a proxy sits in front with a 60 s idle timeout, the proxy closes the TCP connection first, before Node drains it. The result is an asymmetric FIN race that manifests as ECONNRESET on the client.

Proxy buffering. Nginx buffers upstream responses into proxy_buffer_size-aligned memory before forwarding to the client. For SSE, this means the first 8 KB of events sit in Nginx’s buffer, invisible to the browser, until the buffer fills. The client sees nothing, assumes the connection is dead, and reconnects — creating a storm of reconnections under load.

No backpressure propagation. res.write() in Node.js returns false when the kernel send buffer is full. Ignoring this causes data to pile up in the V8 heap, producing linear memory growth and eventual OOM kills. The rate limiting and backpressure guide covers the event-level controls; here we address the underlying socket-pool configuration.

Step-by-Step Resolution Permalink to this section

Step 1 — Raise OS File-Descriptor Limits Permalink to this section

Calculate your target: each SSE connection needs 2 FDs (socket + epoll watch). Add 100 for the process itself. Round up to the next power of two.

# 1a. Check current per-process limit
ulimit -n

# 1b. Raise the system-wide maximum (persistent across reboots via sysctl.conf)
sudo sysctl -w fs.file-max=2097152
echo "fs.file-max=2097152" | sudo tee -a /etc/sysctl.conf

# 1c. Raise per-user/service limits (edit /etc/security/limits.conf)
echo "sse-service soft nofile 131072" | sudo tee -a /etc/security/limits.conf
echo "sse-service hard nofile 262144" | sudo tee -a /etc/security/limits.conf

# 1d. If running under systemd, override the unit file instead:
# [Service]
# LimitNOFILE=262144

Verify after restart:

cat /proc/$(pgrep -f "node server")/limits | grep "open files"
# Max open files  131072  262144  files

Also raise the TCP listen backlog so burst arrivals do not drop before accept:

sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535

Step 2 — Configure Node.js HTTP Agent and Server Timeouts Permalink to this section

For inbound SSE server connections, the critical knobs are on http.Server. For outbound fan-out connections (Node proxying to an upstream event source), configure http.Agent explicitly.

const http = require('http');
const crypto = require('crypto');

// --- Outbound agent for upstream event-source connections ---
const upstreamAgent = new http.Agent({
  keepAlive: true,
  maxSockets: 500,        // Max parallel upstream connections per hostname
  maxFreeSockets: 20,     // Keep 20 idle sockets warm in the pool
  timeout: 0,             // No agent-level socket timeout — stream lives until client leaves
  scheduling: 'fifo',     // Predictable queue ordering under saturation
});

// --- Inbound server ---
const server = http.createServer(app);

// keepAliveTimeout must be LONGER than the upstream proxy idle timeout.
// If Nginx proxy_read_timeout = 3600s, set this above 3600s or to 0.
server.keepAliveTimeout = 65_000;  // 65 s — outlasts the common 60 s proxy idle
server.headersTimeout   = 70_000;  // Must exceed keepAliveTimeout
server.requestTimeout   = 0;       // Disable per-request timeout; SSE has no fixed end

// Maximum number of simultaneous connections before Node queues/rejects new ones.
// Set to your (FD limit − 200) to leave headroom for logs and metrics sockets.
server.maxConnections = 130_000;

server.listen(3000, () => console.log('SSE server ready'));

Parameter	Default (Node 20)	Recommended (10k+ streams)
`keepAliveTimeout`	5,000 ms	65,000 ms
`headersTimeout`	60,000 ms	70,000 ms
`requestTimeout`	300,000 ms	0 (disabled)
`maxConnections`	`Infinity`	`FD limit − 200`
Agent `maxSockets`	`Infinity`	Size to upstream capacity
Agent `timeout`	socket idle timeout	0 for streams

Step 3 — Configure Nginx as an SSE Reverse Proxy Permalink to this section

Nginx sits in front of almost every production Node deployment. Its defaults buffer upstream data and impose a 60 s proxy_read_timeout — both fatal for SSE.

upstream sse_backend {
    server 127.0.0.1:3000;
    keepalive 200;          # Pool of 200 idle keepalive connections to upstream
    keepalive_timeout 65s;  # Match server.keepAliveTimeout above
    keepalive_requests 100000;
}

server {
    listen 443 ssl http2;

    location /api/events {
        proxy_pass         http://sse_backend;

        # Use HTTP/1.1 to the upstream so chunked transfer encoding works
        proxy_http_version 1.1;

        # Clear the Connection header so the upstream sees a persistent connection
        proxy_set_header   Connection "";

        # Mandatory: disable response buffering so events flow immediately
        proxy_buffering    off;
        proxy_cache        off;

        # Extend timeouts to cover the expected stream lifetime
        proxy_read_timeout  3600s;
        proxy_send_timeout  3600s;
        proxy_connect_timeout 10s;

        # Required SSE headers
        proxy_set_header   X-Accel-Buffering no;
        add_header         Cache-Control "no-cache";
        add_header         X-Accel-Buffering "no";
    }
}

The keepalive 200 directive in the upstream block maintains a pool of reusable TCP connections from Nginx to your Node process, reducing the FD churn caused by per-request connect/close cycles under high load.

Step 4 — Implement Connection Registry with Lifecycle Hooks Permalink to this section

Track every open SSE response object in a server-side registry. This gives you accurate concurrency metrics, controlled shutdown, and per-client backpressure without a third-party library.

const activeStreams = new Map(); // clientId → res

function sseHandler(req, res) {
  const clientId = crypto.randomUUID();

  // Enforce a hard cap before accepting the connection
  if (activeStreams.size >= 50_000) {
    res.writeHead(503, { 'Retry-After': '10' });
    res.end('data: {"error":"server_at_capacity"}\n\n');
    return;
  }

  res.writeHead(200, {
    'Content-Type':  'text/event-stream',
    'Cache-Control': 'no-cache',
    'Connection':    'keep-alive',
    'X-Accel-Buffering': 'no',   // Signals Nginx to skip buffering
  });
  res.flushHeaders(); // Flush HTTP 200 immediately so the client doesn't wait

  // Send an initial comment to establish the stream and prevent proxy buffering
  res.write(':ok\n\n');

  activeStreams.set(clientId, res);

  // Clean up when the client disconnects (browser close, navigation, network drop)
  req.on('close', () => {
    activeStreams.delete(clientId);
    // res.end() is safe to call multiple times; Node deduplicates it
    res.end();
  });
}

// Graceful shutdown: notify clients before the process exits
process.once('SIGTERM', () => {
  const shutdown = `event: server-shutdown\ndata: {"reconnect":true}\n\n`;
  for (const [id, res] of activeStreams) {
    res.write(shutdown);
    res.end();
    activeStreams.delete(id);
  }
  server.close(() => process.exit(0));
});

The idempotent event ID guide covers how to attach stable id: fields so reconnecting clients resume without replaying events they already received.

Step 5 — Apply Write Backpressure Per Connection Permalink to this section

When a client’s TCP buffer fills, res.write() returns false. Unchecked writes pile payload strings in the Node.js heap. Pause the upstream source until the socket drains.

function writeEvent(res, eventSource, payload) {
  const chunk = `data: ${JSON.stringify(payload)}\n\n`;
  const drained = res.write(chunk);

  if (!drained) {
    // Pause the event emitter until this response socket drains
    eventSource.pause();
    res.once('drain', () => {
      eventSource.resume();
    });
  }
}

For a Redis pub/sub fan-out pattern — where one Redis channel feeds many SSE clients — pause the Redis subscriber when any client’s socket is full, or use a per-client queue with a bounded size and drop policy:

const MAX_QUEUE = 50; // Drop oldest when queue exceeds this depth

class SSEClient {
  constructor(res) {
    this.res = res;
    this.queue = [];
    this.draining = false;
  }

  push(chunk) {
    if (this.queue.length >= MAX_QUEUE) {
      this.queue.shift(); // Drop oldest, keep newest
    }
    this.queue.push(chunk);
    this._drain();
  }

  _drain() {
    if (this.draining || this.queue.length === 0) return;
    this.draining = true;
    const write = () => {
      while (this.queue.length > 0) {
        const ok = this.res.write(this.queue[0]);
        if (!ok) {
          this.res.once('drain', write);
          return;
        }
        this.queue.shift();
      }
      this.draining = false;
    };
    write();
  }
}

Validation & Monitoring Permalink to this section

Verify Headers and Stream Delivery Permalink to this section

# Check response headers — all four must be present
curl -sI -N https://your-api.com/api/events \
  | grep -iE '(cache-control|connection|content-type|transfer-encoding|x-accel-buffering)'

# Expected output:
# Content-Type: text/event-stream
# Cache-Control: no-cache
# Connection: keep-alive
# Transfer-Encoding: chunked
# X-Accel-Buffering: no

# Receive a live stream and time the first byte (TTFB should be <200 ms)
curl -N --max-time 30 -w "\nTTFB: %{time_starttransfer}s\n" \
  -H "Accept: text/event-stream" https://your-api.com/api/events

Instrument with Prometheus Metrics Permalink to this section

const client = require('prom-client');
client.collectDefaultMetrics(); // Includes event loop lag, heap, FD count

const activeConnsGauge = new client.Gauge({
  name: 'sse_active_connections_total',
  help: 'Number of currently open SSE connections',
});

const disconnectCounter = new client.Counter({
  name: 'sse_disconnects_total',
  help: 'Total SSE client disconnections',
});

// Update in handler
setInterval(() => activeConnsGauge.set(activeStreams.size), 1000);
req.on('close', () => disconnectCounter.inc());

Alert on:

sse_active_connections_total approaching server.maxConnections
process_open_fds exceeding 80% of the hard nofile limit
nodejs_eventloop_lag_seconds p99 > 100 ms (indicates CPU saturation, not pool exhaustion)

Load Test with k6 Permalink to this section

// k6 load test: 10,000 concurrent SSE streams for 5 minutes
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 10000,
  duration: '5m',
  thresholds: {
    http_req_duration:         ['p(99)<200'],  // TTFB under 200 ms at p99
    http_req_failed:           ['rate<0.001'], // Less than 0.1% errors
    http_reqs:                 ['rate>500'],   // Sustained throughput
  },
};

export default function () {
  const res = http.get('https://your-api.com/api/events', {
    headers: { Accept: 'text/event-stream' },
    timeout: '300s',
    responseType: 'none', // Do not buffer response body in k6
  });
  check(res, { 'status 200': (r) => r.status === 200 });
  sleep(60); // Hold connection open for 60 s per VU
}

Watch for EMFILE errors in server logs during the ramp-up. If they appear, the nofile limit from Step 1 is not yet applied to the running process.

Verification Checklist Permalink to this section

cat /proc/<pid>/limits | grep "open files" shows soft >= 65,536 and hard >= 131,072
sysctl fs.file-max returns >= 2,097,152
curl -sI response includes Content-Type: text/event-stream, Transfer-Encoding: chunked, and X-Accel-Buffering: no
Nginx proxy_buffering off and proxy_read_timeout >= expected stream lifetime are in the active config (nginx -T | grep proxy_buffering)
server.requestTimeout is 0 and server.keepAliveTimeout > upstream proxy idle timeout
k6 or vegeta load test reaches target concurrency with http_req_failed < 0.1%
Prometheus metric sse_active_connections_total tracks connection count within ±5% of expected
SIGTERM drains all active streams and sends a server-shutdown event before process exit

Frequently Asked Questions Permalink to this section

How many SSE connections can a single Node.js process handle?

In practice, 20,000–80,000 simultaneous streams per process on modern hardware (4 vCPU, 8 GB RAM). The binding constraints are OS file descriptors, the Node.js single-threaded event loop, and memory — roughly 4–8 KB of heap per idle SSE response. CPU only becomes a factor when event throughput is high (thousands of events per second across all connections). Horizontal scaling with sticky sessions or a Redis pub/sub fan-out is typically more operationally predictable than pushing a single process to its absolute ceiling.

Why does my proxy close SSE connections after exactly 60 seconds?

Nginx defaults to proxy_read_timeout 60s and HAProxy defaults to timeout tunnel 0 but timeout server 1m. When no data flows on the upstream socket for that interval, the proxy sends a FIN. Fix this by: (1) setting proxy_read_timeout 3600s in Nginx, and (2) sending a SSE comment heartbeat (:heartbeat\n\n) every 20–30 s so the proxy sees activity. The heartbeat is invisible to the EventSource API but resets the proxy idle timer.

Should I use HTTP/2 for SSE to reduce connection overhead?

HTTP/2 multiplexes many streams over one TCP connection, but the browser EventSource API does not use HTTP/2 — it always opens an HTTP/1.1 connection. The fetch-based SSE pattern (using ReadableStream) does use HTTP/2 when available, which reduces the FD pressure per client. However, HTTP/2 multiplexing shifts head-of-line blocking from TCP (HTTP/1.1) to the application layer when streams interfere. For SSE specifically, HTTP/1.1 with properly tuned keep-alive is simpler and well-supported. See SSE vs WebSockets vs HTTP Polling for a fuller protocol comparison.

What is the right keepAliveTimeout value for Node.js behind Nginx?

Set server.keepAliveTimeout to your Nginx proxy_read_timeout value plus 5 seconds. If Nginx closes the upstream connection after 3,600 s of inactivity, set Node's keepAliveTimeout to 3,605,000 ms. This ensures Node never closes the TCP connection from its side before Nginx does, eliminating the asymmetric-FIN race that causes ECONNRESET. Always set server.headersTimeout 5–10 s above keepAliveTimeout.

How do I handle a Redis pub/sub subscriber that feeds thousands of SSE clients?

Subscribe once per channel per Node.js process, not once per SSE client. Maintain a Map<channel, Set<SSEClient>> registry. When the Redis message arrives, iterate the set and write to each client. Apply the per-client bounded queue from Step 5 so a slow client cannot block delivery to fast clients. See Redis Pub/Sub Fan-Out for SSE for the full architecture.

⚡ Production Directives

Set LimitNOFILE=262144 in your systemd unit file — runtime ulimit -n changes do not survive service restarts.
Send a SSE comment heartbeat (:ping\n\n) every 25 s to reset proxy idle timers and confirm the client is still alive without triggering EventSource reconnection.
Set server.requestTimeout = 0 and server.keepAliveTimeout above your longest-lived proxy timeout, or you will see silent drops with no error logs on either side.
Gate new SSE connections with a concurrency check against server.maxConnections; return HTTP 503 with Retry-After rather than silently queuing connections until OOM.
Alert when process_open_fds exceeds 80% of the hard nofile limit — you want time to scale out before the process crashes with EMFILE.

Configuring Connection Pools for High-Concurrency SSE #Permalink to this section

Symptom & Developer Intent #Permalink to this section

Root Cause Analysis #Permalink to this section

Step-by-Step Resolution #Permalink to this section

Step 1 — Raise OS File-Descriptor Limits #Permalink to this section

Step 2 — Configure Node.js HTTP Agent and Server Timeouts #Permalink to this section

Step 3 — Configure Nginx as an SSE Reverse Proxy #Permalink to this section

Step 4 — Implement Connection Registry with Lifecycle Hooks #Permalink to this section

Step 5 — Apply Write Backpressure Per Connection #Permalink to this section

Validation & Monitoring #Permalink to this section

Verify Headers and Stream Delivery #Permalink to this section

Instrument with Prometheus Metrics #Permalink to this section

Load Test with k6 #Permalink to this section

Verification Checklist #Permalink to this section

Frequently Asked Questions #Permalink to this section

Related #Permalink to this section