Implementing HTTP keep-alive for Node.js SSE

Symptom & Developer Intent

Server-Sent Event (SSE) connections silently terminate after 60–120 seconds of payload inactivity. Frontend EventSource instances fire onerror callbacks, triggering aggressive reconnect loops that spike CPU, exhaust connection pools, and generate false-positive alerts.

Intent: Configure the Node.js HTTP server to emit periodic heartbeat comments (: heartbeat\n\n) to maintain TCP socket activity. This bypasses idle timeouts enforced by reverse proxies, load balancers, and NAT gateways without disrupting the event stream.

Root Cause Analysis

The text/event-stream MIME type does not inherently prevent idle connection drops. Infrastructure layers (Nginx, HAProxy, AWS ALB, Cloudflare) enforce strict proxy_read_timeout or idle connection limits. When zero bytes traverse the socket within the configured threshold, the proxy issues a TCP RST or FIN.

Node.js http.Server defaults to a 5-second keepAliveTimeout, but this parameter only governs standard HTTP/1.1 request-response cycles, not persistent streaming endpoints. Without explicit application-level keep-alive frames, intermediate network layers treat the idle stream as dead. Mapping proxy idle thresholds to persistent stream behavior is detailed in the HTTP Keep-Alive & Connection Lifecycle documentation.

Step-by-Step Resolution

  1. Set Mandatory SSE Headers Initialize the response immediately. Disable implicit buffering to ensure headers and initial bytes reach the client without delay.
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'X-Accel-Buffering': 'no' // Critical for Nginx
});
res.flushHeaders();
  1. Implement Heartbeat Interval Push a comment frame (: prefix) every 15–30 seconds. The SSE spec ignores lines starting with :, so clients do not trigger onmessage handlers.
const HEARTBEAT_INTERVAL_MS = 20000;
const heartbeat = setInterval(() => {
res.write(': keep-alive\n\n');
}, HEARTBEAT_INTERVAL_MS);
  1. Tune Node.js Server Timeouts Override global server timeouts to exceed your maximum expected session duration plus proxy overhead. Baseline server configurations for streaming workloads are documented under Backend Stream Generation & Connection Management.
server.keepAliveTimeout = 300000; // 5 minutes
server.headersTimeout = 305000; // Slightly higher than keepAlive
server.requestTimeout = 0; // Disable for long-lived streams
  1. Handle Client Disconnects Bind to req.on('close') to clear intervals and release memory. Failure to do so causes setInterval accumulation and memory leaks under high concurrency.
req.on('close', () => {
clearInterval(heartbeat);
res.end();
});

Complete Runnable Implementation

const http = require('http');

const server = http.createServer((req, res) => {
 if (req.url !== '/events') return res.writeHead(404).end();

 res.writeHead(200, {
 'Content-Type': 'text/event-stream',
 'Cache-Control': 'no-cache',
 'Connection': 'keep-alive',
 'X-Accel-Buffering': 'no'
 });
 res.flushHeaders();

 const heartbeat = setInterval(() => res.write(': keep-alive\n\n'), 20000);

 req.on('close', () => {
 clearInterval(heartbeat);
 res.end();
 });
});

server.keepAliveTimeout = 300000;
server.headersTimeout = 305000;
server.requestTimeout = 0;
server.listen(3000, () => console.log('SSE server listening on :3000'));

Validation & Monitoring

  1. Local Verification Run curl -N -v http://localhost:3000/events. Verify : keep-alive payloads appear at exact 20s intervals. Confirm no Connection: close or Transfer-Encoding: chunked anomalies interrupt the stream.

  2. Proxy Bypass & TCP State Test Route traffic through your production load balancer. Monitor socket persistence with ss -tnp | grep :3000. Validate that ESTABLISHED state survives >5 minutes of zero data payload. Drop rate should be 0% during idle windows.

  3. Client-Side Telemetry Instrument EventSource.readyState transitions. A stable 1 (CONNECTED) confirms successful keep-alive. Track onerror frequency; a drop to near-zero validates proxy timeout resolution. Log lastEventId to ensure reconnection resumes from the correct offset.

  4. Resource Guardrails Monitor heap usage via process.memoryUsage().heapUsed. Ensure clearInterval executes reliably; accumulation under 1k concurrent connections should not exceed baseline + 50MB. Alert on req.socket.bytesRead === 0 persisting beyond 30s to catch zombie connections before OOM.