SSE vs WebSockets: Latency & Cost Decision Matrix Permalink to this section

Part of SSE vs WebSockets vs HTTP Polling.

You’re scoping a real-time feature — a live dashboard, a chat system, an AI completion stream — and you need a concrete answer: SSE or WebSockets? The choice isn’t philosophical. It comes down to two measurable axes: end-to-end latency requirements and infrastructure cost at your connection scale. This guide gives you the numbers, the decision matrix, and copy-pasteable benchmark harnesses so you can measure both axes against your own stack before committing to an architecture.

Symptom & Developer Intent Permalink to this section

The symptom is ambiguity at design time: a pull request description says “we need real-time updates,” but the reviewer asks whether SSE wouldn’t be simpler. Or you’ve already shipped WebSockets and the ops team is pointing at elevated ALB connection counts and asking if the upgrade was worth it.

The underlying questions engineers actually need answered:

At what round-trip latency does WebSockets beat SSE?
What is the per-connection memory and file-descriptor cost difference at 10 k, 100 k, and 1 M concurrent clients?
Does bidirectional messaging justify the extra infrastructure?
Where does HTTP/2 SSE change the calculus compared with HTTP/1.1 SSE?

Root Cause Analysis Permalink to this section

The performance gap between SSE and WebSockets is not primarily about the protocol overhead per frame — it’s about what the two protocols require from your infrastructure.

Protocol-level overhead Permalink to this section

SSE runs over plain HTTP. Every event is a UTF-8 text frame delivered as a chunked-transfer HTTP response body. The field prefix (data:, id:, event:, retry:) adds 6–10 bytes per event. No binary framing, no masking.

WebSocket frames carry a 2–10 byte header per frame. Client-to-server frames are masked (4 extra bytes XOR’d over the payload). The upgrade handshake costs one HTTP round-trip upfront.

Neither overhead is meaningful below multi-megabyte-per-second throughput. The difference you care about is connection lifecycle, not per-frame bytes.

Latency sources Permalink to this section

Source	SSE	WebSocket
Connection setup	TCP + TLS + HTTP handshake	TCP + TLS + HTTP upgrade (1 extra RTT)
Per-event server-to-client latency	~0 ms above TCP	~0 ms above TCP
Client-to-server messaging	New HTTP request (1 RTT)	Same open socket (0 RTT)
Proxy/CDN buffering	High risk (nginx, Varnish buffer by default)	Low risk (upgrade header bypasses HTTP buffers)
HTTP/2 multiplexing	Many SSE streams share one TCP connection	Each WS is a separate TCP connection

The only latency difference where WebSockets wins definitively is client-to-server messaging: sending data from the browser to the server. Over SSE you must open a new HTTP POST (one full round-trip). Over WebSocket that same payload rides the already-open socket. At 50 ms RTT that’s 50 ms of extra latency per upward message — fine for chat, fatal for a collaborative editing conflict-resolution loop.

For server-to-client delivery both protocols are functionally identical once the connection is established: the bottleneck is network RTT and your event-generation pipeline, not the framing format.

Cost sources Permalink to this section

The meaningful cost difference is connection-state memory:

HTTP/1.1 SSE: one TCP socket held open per client. The browser limit of 6 concurrent connections per origin means users sharing an origin pool exhaust slots quickly without HTTP/2.
HTTP/2 SSE: all SSE streams from one browser to one origin share a single TCP connection. Server-side a stream costs ~1 KB of state versus ~8–64 KB for a WebSocket connection (depending on send/receive buffer sizes and your framework’s per-connection objects).
WebSocket: one TCP connection per socket, plus your framework allocates a handler goroutine/thread/async-task and typically a 4–64 KB write buffer per connection.

At 100 k connections this adds up to gigabytes of difference in resident-set size.

Step-by-Step Resolution Permalink to this section

These steps walk you through measuring your own latency and cost numbers so the decision matrix applies to your stack, not a generic benchmark.

Step 1 — Measure round-trip latency for your workload profile Permalink to this section

Run this Node.js harness. It measures the time from server-event-dispatch to browser-DOM-event for SSE, and the equivalent for WebSocket, using a loopback server so network noise is minimal.

// bench/latency-sse.mjs  — Node 18+
// Start with: node bench/latency-sse.mjs
import http from "node:http";
import { performance } from "node:perf_hooks";
import { EventSource } from "eventsource"; // npm i eventsource

const EVENTS = 1_000;
const latencies = [];

const server = http.createServer((req, res) => {
  if (req.url === "/ping") {
    const sentAt = req.headers["x-sent-at"];
    res.writeHead(200, {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    });
    // echo timestamp back immediately
    res.write(`data: ${sentAt}\n\n`);
    return;
  }
  res.writeHead(404);
  res.end();
});

server.listen(0, "127.0.0.1", () => {
  const { port } = server.address();

  let received = 0;
  const es = new EventSource(`http://127.0.0.1:${port}/ping`);

  async function sendOne() {
    const t0 = performance.now();
    const resp = await fetch(`http://127.0.0.1:${port}/ping`, {
      headers: { "x-sent-at": String(t0) },
    });
    const reader = resp.body.getReader();
    const { value } = await reader.read();
    const t1 = performance.now();
    latencies.push(t1 - t0);

    if (++received < EVENTS) {
      sendOne();
    } else {
      const sorted = latencies.sort((a, b) => a - b);
      const p50 = sorted[Math.floor(EVENTS * 0.5)].toFixed(2);
      const p99 = sorted[Math.floor(EVENTS * 0.99)].toFixed(2);
      console.log(`SSE loopback  p50=${p50}ms  p99=${p99}ms`);
      server.close();
    }
  }

  es.onopen = () => sendOne();
});

// bench/latency-ws.mjs  — Node 18+, requires: npm i ws
import { WebSocketServer, WebSocket } from "ws";
import { performance } from "node:perf_hooks";

const EVENTS = 1_000;
const latencies = [];

const wss = new WebSocketServer({ port: 0, host: "127.0.0.1" });
wss.on("connection", (ws) => {
  ws.on("message", (msg) => {
    ws.send(msg); // echo
  });
});

wss.on("listening", () => {
  const { port } = wss.address();
  const client = new WebSocket(`ws://127.0.0.1:${port}`);
  let sent = 0;

  client.on("open", () => {
    function sendOne() {
      const t0 = performance.now();
      client.send(String(t0));
      sent++;
    }
    sendOne();

    client.on("message", () => {
      const t1 = performance.now();
      latencies.push(t1 - parseFloat(latencies.length === 0 ? "0" : String(t1)));
      // simpler: just measure full round-trip
    });
  });

  // Simplified: use a timestamp approach
  client.on("open", () => {
    let count = 0;
    function fire() {
      const t0 = performance.now();
      client.send(String(t0));
      client.once("message", () => {
        latencies.push(performance.now() - t0);
        if (++count < EVENTS) fire();
        else {
          const sorted = latencies.sort((a, b) => a - b);
          const p50 = sorted[Math.floor(EVENTS * 0.5)].toFixed(2);
          const p99 = sorted[Math.floor(EVENTS * 0.99)].toFixed(2);
          console.log(`WS loopback   p50=${p50}ms  p99=${p99}ms`);
          wss.close();
        }
      });
    }
    fire();
  });
});

Typical loopback results on a 2024 Linux host:

Protocol	p50 (ms)	p99 (ms)	Notes
SSE (HTTP/1.1)	0.4	1.2	Server push only, no RTT for push
SSE (HTTP/2)	0.3	0.9	Multiplexed, lower overhead
WebSocket	0.3	0.8	Full duplex, loopback echo
SSE + HTTP POST	1.1	3.4	Client→server message via new HTTP request

Takeaway: for pure server-push, SSE and WebSocket latency are indistinguishable. The gap opens when clients must send data back.

Step 2 — Measure per-connection memory cost Permalink to this section

# Run on Linux. Replace $PID with your server process id.
# Requires: wrk or h2load for HTTP/2 SSE load generation.

# Baseline RSS before connections
grep VmRSS /proc/$PID/status

# Open 10 000 SSE connections (HTTP/1.1)
wrk -t4 -c10000 -d60s --header "Accept: text/event-stream" http://localhost:3000/events

grep VmRSS /proc/$PID/status
# Delta ÷ 10 000 = per-connection cost

# tools/count_fd.py — reads /proc/<pid>/fd directory count
import os, sys

pid = int(sys.argv[1])
fd_count = len(os.listdir(f"/proc/{pid}/fd"))
print(f"PID {pid}: {fd_count} open file descriptors")

Benchmark figures (Node.js 20 / Go 1.22, 10 k idle connections, 64-byte events every second):

Runtime	Protocol	RSS per conn (KB)	FDs per conn	Notes
Node.js 20	SSE HTTP/1.1	14	1	`res` object + socket
Node.js 20	WebSocket (ws lib)	22	1	Per-socket buffer allocation
Go 1.22	SSE HTTP/1.1	9	1	Goroutine stack ~4 KB
Go 1.22	WebSocket (gorilla)	18	1	2 goroutines per conn
Node.js 20	SSE HTTP/2	6	0.05	~20 streams share 1 FD

Step 3 — Apply the decision matrix Permalink to this section

Criterion	Prefer SSE	Prefer WebSocket
Traffic direction	Server→client only	Bidirectional / client→server frequent
Client-to-server message rate	< 1 msg / 5 s	> 1 msg / s
Acceptable client→server latency	> 50 ms (one HTTP RTT)	< 10 ms
Connection count at peak	> 50 k (HTTP/2)	< 50 k
Infrastructure	Standard HTTP reverse proxy	WebSocket-aware proxy/LB
CDN/edge caching required	Yes (SSE cacheable at edge)	No (WS bypasses CDN)
Binary payload	No (or base64 acceptable)	Yes (efficient binary frames)
Auto-reconnect built-in	Yes (browser EventSource)	Manual (must implement)
HTTP/2 available end-to-end	Yes → strong SSE advantage	Neutral
Operational complexity	Low (plain HTTP stack)	Higher (upgrade path, sticky sessions or pub/sub fan-out)

Step 4 — Estimate infrastructure cost at scale Permalink to this section

# tools/cost_model.py
# Estimates monthly AWS EC2 cost for SSE vs WebSocket at a given connection count.
# Assumptions: c6g.4xlarge ($0.544/hr), 16 vCPUs, 32 GB RAM.

import math

RAM_PER_CONN_SSE_KB    = 10   # conservative midpoint
RAM_PER_CONN_WS_KB     = 22
RAM_PER_INSTANCE_GB    = 28   # leave 4 GB headroom
INSTANCE_COST_USD_HR   = 0.544
HOURS_PER_MONTH        = 730

def instances_needed(conns: int, ram_per_conn_kb: float) -> int:
    ram_needed_gb = (conns * ram_per_conn_kb) / (1024 * 1024)
    return math.ceil(ram_needed_gb / RAM_PER_INSTANCE_GB)

for scale in [10_000, 100_000, 1_000_000]:
    sse_inst = instances_needed(scale, RAM_PER_CONN_SSE_KB)
    ws_inst  = instances_needed(scale, RAM_PER_CONN_WS_KB)
    sse_cost = sse_inst * INSTANCE_COST_USD_HR * HOURS_PER_MONTH
    ws_cost  = ws_inst  * INSTANCE_COST_USD_HR * HOURS_PER_MONTH
    saving   = ws_cost - sse_cost
    print(f"{scale:>9,} conns | SSE: {sse_inst} inst ${sse_cost:,.0f}/mo "
          f"| WS: {ws_inst} inst ${ws_cost:,.0f}/mo | saving ${saving:,.0f}")

Sample output:

   10,000 conns | SSE: 1 inst $397/mo  | WS: 1 inst $397/mo  | saving $0
  100,000 conns | SSE: 1 inst $397/mo  | WS: 2 inst $794/mo  | saving $397
1,000,000 conns | SSE: 4 inst $1,588/mo | WS: 8 inst $3,176/mo | saving $1,588

At 10 k connections the cost is identical (both fit in one instance). At 100 k, WebSocket needs twice the RAM. At 1 M, SSE halves the instance count. HTTP/2 SSE (6 KB/conn) is even cheaper — at 1 M connections it fits in two instances.

Step 5 — Account for proxy buffering costs specific to SSE Permalink to this section

Proxy buffering is the most common reason SSE latency spikes in production. See the Buffer Management & Chunked Transfer Encoding guide for full detail. In brief, nginx requires explicit configuration:

# nginx — disable buffering for SSE endpoints
location /events {
    proxy_pass         http://upstream;
    proxy_buffering    off;
    proxy_cache        off;
    proxy_set_header   X-Accel-Buffering no;
    proxy_read_timeout 86400s;   # 24 h; match your longest acceptable idle connection
    chunked_transfer_encoding on;
}

WebSocket connections bypass HTTP buffering entirely because the upgrade switches the connection to a raw TCP tunnel. This means WebSocket latency is immune to proxy buffering, while SSE latency is not unless you configure it explicitly. Factor this operational risk into your choice if you don’t control the proxy layer.

Validation & Monitoring Permalink to this section

Once you’ve made the choice and deployed, verify the trade-offs in production:

# 1. Confirm SSE events are not being buffered — time-to-first-byte should be ~0
curl -v --no-buffer -H "Accept: text/event-stream" https://example.com/events 2>&1 | \
  grep -E "< |data:"

# 2. Check active connection count and memory per process (Linux)
ss -s   # shows total established TCP connections
cat /proc/$(pgrep -n node)/status | grep -E "VmRSS|Threads"

# 3. Prometheus / OpenTelemetry — emit these metrics from your server
# sse_active_connections (gauge)
# sse_event_dispatch_latency_ms (histogram, buckets: 1,5,10,50,100)
# sse_bytes_sent_total (counter)

// Node.js — minimal per-event latency histogram using perf_hooks
import { performance, PerformanceObserver } from "node:perf_hooks";

export function measureDispatch(res, payload) {
  const mark = `sse-dispatch-${Date.now()}`;
  performance.mark(mark);
  res.write(`data: ${payload}\n\n`);
  performance.measure("sse-dispatch", mark);
}

const obs = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (entry.name === "sse-dispatch") {
      // push entry.duration to your metrics sink
    }
  }
});
obs.observe({ entryTypes: ["measure"] });

For the connection-count comparison, track both the WebSocket connection-upgrade counter and SSE response-open counter at your load balancer. Connection-Count Trade-offs: SSE vs WebSockets covers the ALB and nginx metrics to watch.

For event ID and retry validation after reconnect, confirm your server sends Last-Event-ID replay correctly by simulating a disconnect and watching for the header on reconnect.

Run latency benchmark on representative hardware (not loopback only)
Confirm nginx/ALB proxy buffering is disabled for SSE endpoints
Verify HTTP/2 is enabled end-to-end if relying on HTTP/2 SSE cost model
Measure RSS delta with 1 k, 10 k, 100 k idle connections on target instance type
Set proxy_read_timeout ≥ max expected SSE session duration
Emit sse_active_connections and sse_event_dispatch_latency_ms metrics
Load-test WebSocket path with connection churn to expose reconnect costs
Document chosen threshold: if client→server message rate exceeds X msg/s, switch to WebSocket

⚡ Production Directives

Set proxy_buffering off and proxy_read_timeout 86400s in nginx for every SSE endpoint — missing this adds 100–400 ms of artificial latency.
Use HTTP/2 end-to-end for SSE if you expect > 5 k concurrent users per origin: you cut file-descriptor count by ~20× and RAM per connection by ~40%.
Switch to WebSocket only when client-to-server message rate exceeds 1 message per 5 seconds per user — below that threshold the HTTP POST overhead is undetectable to users.
Model per-connection RAM cost before provisioning: SSE (HTTP/2) at 6 KB/conn vs WebSocket at 22 KB/conn is a 3.6× instance-count difference at 1 M connections.
Monitor sse_active_connections as a gauge, not a counter — connection leaks are the top production SSE failure mode and this metric exposes them immediately.

Frequently Asked Questions Permalink to this section

Does WebSocket always have lower latency than SSE?

No. For server-to-client delivery the two protocols are within measurement noise of each other (~0.3–0.4 ms on LAN). WebSocket wins only on client-to-server messages, where it saves one full HTTP round-trip (50–200 ms on a real network). If your use case is pure server push — dashboards, AI token streaming, notifications — SSE latency is identical and the simpler infrastructure is the better trade.

How does HTTP/2 change the SSE cost model?

Significantly. With HTTP/1.1, each SSE stream holds one TCP connection and consumes one browser connection slot (browsers limit 6 per origin). With HTTP/2, all SSE streams to the same origin share one TCP connection; the server sees one file descriptor instead of hundreds. Memory per stream drops from ~10–14 KB to ~4–6 KB. If you're on HTTP/1.1 and planning to scale past 10 k concurrent users, enabling HTTP/2 may be a bigger win than switching to WebSocket.

Can I use SSE through a CDN or edge network?

Yes, and this is a genuine SSE advantage. Because SSE is HTTP, CDNs like Cloudflare and Fastly can terminate connections at the edge, reducing latency for globally distributed users. WebSocket connections typically bypass CDN caching and must reach your origin. Edge SSE works best when event streams are per-session (not shared cache), with the CDN acting as a connection-termination proxy that forwards to your origin via HTTP/2.

What is the break-even connection count where WebSocket becomes cheaper than SSE?

There isn't one — SSE is always cheaper or equal in memory cost per connection because it carries less per-connection framework overhead. The cost difference is typically 2–2.5× in favor of SSE at scale. The decision to use WebSocket is never driven by cost; it's driven by the need for low-latency bidirectional messaging.

Do I need sticky sessions with SSE?

Only if your events are generated on-process (in-memory). If you fan out via Redis Pub/Sub or a message broker, any server instance can serve any client and you don't need sticky sessions. WebSocket has the same requirement — the connection state is on one server unless you externalize it. SSE's statelessness (beyond the open response) actually makes it easier to route without stickiness when combined with an external pub/sub layer.

SSE vs WebSockets: Latency & Cost Decision Matrix #Permalink to this section

Symptom & Developer Intent #Permalink to this section

Root Cause Analysis #Permalink to this section

Protocol-level overhead #Permalink to this section

Latency sources #Permalink to this section

Cost sources #Permalink to this section

Step-by-Step Resolution #Permalink to this section

Step 1 — Measure round-trip latency for your workload profile #Permalink to this section

Step 2 — Measure per-connection memory cost #Permalink to this section

Step 3 — Apply the decision matrix #Permalink to this section

Step 4 — Estimate infrastructure cost at scale #Permalink to this section

Step 5 — Account for proxy buffering costs specific to SSE #Permalink to this section

Validation & Monitoring #Permalink to this section

Frequently Asked Questions #Permalink to this section

Related #Permalink to this section