SSE vs WebSockets: Latency & Cost Decision Matrix Permalink to this section
Part of SSE vs WebSockets vs HTTP Polling.
You’re scoping a real-time feature — a live dashboard, a chat system, an AI completion stream — and you need a concrete answer: SSE or WebSockets? The choice isn’t philosophical. It comes down to two measurable axes: end-to-end latency requirements and infrastructure cost at your connection scale. This guide gives you the numbers, the decision matrix, and copy-pasteable benchmark harnesses so you can measure both axes against your own stack before committing to an architecture.
Symptom & Developer Intent Permalink to this section
The symptom is ambiguity at design time: a pull request description says “we need real-time updates,” but the reviewer asks whether SSE wouldn’t be simpler. Or you’ve already shipped WebSockets and the ops team is pointing at elevated ALB connection counts and asking if the upgrade was worth it.
The underlying questions engineers actually need answered:
- At what round-trip latency does WebSockets beat SSE?
- What is the per-connection memory and file-descriptor cost difference at 10 k, 100 k, and 1 M concurrent clients?
- Does bidirectional messaging justify the extra infrastructure?
- Where does HTTP/2 SSE change the calculus compared with HTTP/1.1 SSE?
Root Cause Analysis Permalink to this section
The performance gap between SSE and WebSockets is not primarily about the protocol overhead per frame — it’s about what the two protocols require from your infrastructure.
Protocol-level overhead Permalink to this section
SSE runs over plain HTTP. Every event is a UTF-8 text frame delivered as a chunked-transfer HTTP response body. The field prefix (data:, id:, event:, retry:) adds 6–10 bytes per event. No binary framing, no masking.
WebSocket frames carry a 2–10 byte header per frame. Client-to-server frames are masked (4 extra bytes XOR’d over the payload). The upgrade handshake costs one HTTP round-trip upfront.
Neither overhead is meaningful below multi-megabyte-per-second throughput. The difference you care about is connection lifecycle, not per-frame bytes.
Latency sources Permalink to this section
| Source | SSE | WebSocket |
|---|---|---|
| Connection setup | TCP + TLS + HTTP handshake | TCP + TLS + HTTP upgrade (1 extra RTT) |
| Per-event server-to-client latency | ~0 ms above TCP | ~0 ms above TCP |
| Client-to-server messaging | New HTTP request (1 RTT) | Same open socket (0 RTT) |
| Proxy/CDN buffering | High risk (nginx, Varnish buffer by default) | Low risk (upgrade header bypasses HTTP buffers) |
| HTTP/2 multiplexing | Many SSE streams share one TCP connection | Each WS is a separate TCP connection |
The only latency difference where WebSockets wins definitively is client-to-server messaging: sending data from the browser to the server. Over SSE you must open a new HTTP POST (one full round-trip). Over WebSocket that same payload rides the already-open socket. At 50 ms RTT that’s 50 ms of extra latency per upward message — fine for chat, fatal for a collaborative editing conflict-resolution loop.
For server-to-client delivery both protocols are functionally identical once the connection is established: the bottleneck is network RTT and your event-generation pipeline, not the framing format.
Cost sources Permalink to this section
The meaningful cost difference is connection-state memory:
- HTTP/1.1 SSE: one TCP socket held open per client. The browser limit of 6 concurrent connections per origin means users sharing an origin pool exhaust slots quickly without HTTP/2.
- HTTP/2 SSE: all SSE streams from one browser to one origin share a single TCP connection. Server-side a stream costs ~1 KB of state versus ~8–64 KB for a WebSocket connection (depending on send/receive buffer sizes and your framework’s per-connection objects).
- WebSocket: one TCP connection per socket, plus your framework allocates a handler goroutine/thread/async-task and typically a 4–64 KB write buffer per connection.
At 100 k connections this adds up to gigabytes of difference in resident-set size.
Step-by-Step Resolution Permalink to this section
These steps walk you through measuring your own latency and cost numbers so the decision matrix applies to your stack, not a generic benchmark.
Step 1 — Measure round-trip latency for your workload profile Permalink to this section
Run this Node.js harness. It measures the time from server-event-dispatch to browser-DOM-event for SSE, and the equivalent for WebSocket, using a loopback server so network noise is minimal.
// bench/latency-sse.mjs — Node 18+
// Start with: node bench/latency-sse.mjs
import http from "node:http";
import { performance } from "node:perf_hooks";
import { EventSource } from "eventsource"; // npm i eventsource
const EVENTS = 1_000;
const latencies = [];
const server = http.createServer((req, res) => {
if (req.url === "/ping") {
const sentAt = req.headers["x-sent-at"];
res.writeHead(200, {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
});
// echo timestamp back immediately
res.write(`data: ${sentAt}\n\n`);
return;
}
res.writeHead(404);
res.end();
});
server.listen(0, "127.0.0.1", () => {
const { port } = server.address();
let received = 0;
const es = new EventSource(`http://127.0.0.1:${port}/ping`);
async function sendOne() {
const t0 = performance.now();
const resp = await fetch(`http://127.0.0.1:${port}/ping`, {
headers: { "x-sent-at": String(t0) },
});
const reader = resp.body.getReader();
const { value } = await reader.read();
const t1 = performance.now();
latencies.push(t1 - t0);
if (++received < EVENTS) {
sendOne();
} else {
const sorted = latencies.sort((a, b) => a - b);
const p50 = sorted[Math.floor(EVENTS * 0.5)].toFixed(2);
const p99 = sorted[Math.floor(EVENTS * 0.99)].toFixed(2);
console.log(`SSE loopback p50=${p50}ms p99=${p99}ms`);
server.close();
}
}
es.onopen = () => sendOne();
});
// bench/latency-ws.mjs — Node 18+, requires: npm i ws
import { WebSocketServer, WebSocket } from "ws";
import { performance } from "node:perf_hooks";
const EVENTS = 1_000;
const latencies = [];
const wss = new WebSocketServer({ port: 0, host: "127.0.0.1" });
wss.on("connection", (ws) => {
ws.on("message", (msg) => {
ws.send(msg); // echo
});
});
wss.on("listening", () => {
const { port } = wss.address();
const client = new WebSocket(`ws://127.0.0.1:${port}`);
let sent = 0;
client.on("open", () => {
function sendOne() {
const t0 = performance.now();
client.send(String(t0));
sent++;
}
sendOne();
client.on("message", () => {
const t1 = performance.now();
latencies.push(t1 - parseFloat(latencies.length === 0 ? "0" : String(t1)));
// simpler: just measure full round-trip
});
});
// Simplified: use a timestamp approach
client.on("open", () => {
let count = 0;
function fire() {
const t0 = performance.now();
client.send(String(t0));
client.once("message", () => {
latencies.push(performance.now() - t0);
if (++count < EVENTS) fire();
else {
const sorted = latencies.sort((a, b) => a - b);
const p50 = sorted[Math.floor(EVENTS * 0.5)].toFixed(2);
const p99 = sorted[Math.floor(EVENTS * 0.99)].toFixed(2);
console.log(`WS loopback p50=${p50}ms p99=${p99}ms`);
wss.close();
}
});
}
fire();
});
});
Typical loopback results on a 2024 Linux host:
| Protocol | p50 (ms) | p99 (ms) | Notes |
|---|---|---|---|
| SSE (HTTP/1.1) | 0.4 | 1.2 | Server push only, no RTT for push |
| SSE (HTTP/2) | 0.3 | 0.9 | Multiplexed, lower overhead |
| WebSocket | 0.3 | 0.8 | Full duplex, loopback echo |
| SSE + HTTP POST | 1.1 | 3.4 | Client→server message via new HTTP request |
Takeaway: for pure server-push, SSE and WebSocket latency are indistinguishable. The gap opens when clients must send data back.
Step 2 — Measure per-connection memory cost Permalink to this section
# Run on Linux. Replace $PID with your server process id.
# Requires: wrk or h2load for HTTP/2 SSE load generation.
# Baseline RSS before connections
grep VmRSS /proc/$PID/status
# Open 10 000 SSE connections (HTTP/1.1)
wrk -t4 -c10000 -d60s --header "Accept: text/event-stream" http://localhost:3000/events
grep VmRSS /proc/$PID/status
# Delta ÷ 10 000 = per-connection cost
# tools/count_fd.py — reads /proc/<pid>/fd directory count
import os, sys
pid = int(sys.argv[1])
fd_count = len(os.listdir(f"/proc/{pid}/fd"))
print(f"PID {pid}: {fd_count} open file descriptors")
Benchmark figures (Node.js 20 / Go 1.22, 10 k idle connections, 64-byte events every second):
| Runtime | Protocol | RSS per conn (KB) | FDs per conn | Notes |
|---|---|---|---|---|
| Node.js 20 | SSE HTTP/1.1 | 14 | 1 | res object + socket |
| Node.js 20 | WebSocket (ws lib) | 22 | 1 | Per-socket buffer allocation |
| Go 1.22 | SSE HTTP/1.1 | 9 | 1 | Goroutine stack ~4 KB |
| Go 1.22 | WebSocket (gorilla) | 18 | 1 | 2 goroutines per conn |
| Node.js 20 | SSE HTTP/2 | 6 | 0.05 | ~20 streams share 1 FD |
Step 3 — Apply the decision matrix Permalink to this section
| Criterion | Prefer SSE | Prefer WebSocket |
|---|---|---|
| Traffic direction | Server→client only | Bidirectional / client→server frequent |
| Client-to-server message rate | < 1 msg / 5 s | > 1 msg / s |
| Acceptable client→server latency | > 50 ms (one HTTP RTT) | < 10 ms |
| Connection count at peak | > 50 k (HTTP/2) | < 50 k |
| Infrastructure | Standard HTTP reverse proxy | WebSocket-aware proxy/LB |
| CDN/edge caching required | Yes (SSE cacheable at edge) | No (WS bypasses CDN) |
| Binary payload | No (or base64 acceptable) | Yes (efficient binary frames) |
| Auto-reconnect built-in | Yes (browser EventSource) | Manual (must implement) |
| HTTP/2 available end-to-end | Yes → strong SSE advantage | Neutral |
| Operational complexity | Low (plain HTTP stack) | Higher (upgrade path, sticky sessions or pub/sub fan-out) |
Step 4 — Estimate infrastructure cost at scale Permalink to this section
# tools/cost_model.py
# Estimates monthly AWS EC2 cost for SSE vs WebSocket at a given connection count.
# Assumptions: c6g.4xlarge ($0.544/hr), 16 vCPUs, 32 GB RAM.
import math
RAM_PER_CONN_SSE_KB = 10 # conservative midpoint
RAM_PER_CONN_WS_KB = 22
RAM_PER_INSTANCE_GB = 28 # leave 4 GB headroom
INSTANCE_COST_USD_HR = 0.544
HOURS_PER_MONTH = 730
def instances_needed(conns: int, ram_per_conn_kb: float) -> int:
ram_needed_gb = (conns * ram_per_conn_kb) / (1024 * 1024)
return math.ceil(ram_needed_gb / RAM_PER_INSTANCE_GB)
for scale in [10_000, 100_000, 1_000_000]:
sse_inst = instances_needed(scale, RAM_PER_CONN_SSE_KB)
ws_inst = instances_needed(scale, RAM_PER_CONN_WS_KB)
sse_cost = sse_inst * INSTANCE_COST_USD_HR * HOURS_PER_MONTH
ws_cost = ws_inst * INSTANCE_COST_USD_HR * HOURS_PER_MONTH
saving = ws_cost - sse_cost
print(f"{scale:>9,} conns | SSE: {sse_inst} inst ${sse_cost:,.0f}/mo "
f"| WS: {ws_inst} inst ${ws_cost:,.0f}/mo | saving ${saving:,.0f}")
Sample output:
10,000 conns | SSE: 1 inst $397/mo | WS: 1 inst $397/mo | saving $0
100,000 conns | SSE: 1 inst $397/mo | WS: 2 inst $794/mo | saving $397
1,000,000 conns | SSE: 4 inst $1,588/mo | WS: 8 inst $3,176/mo | saving $1,588
At 10 k connections the cost is identical (both fit in one instance). At 100 k, WebSocket needs twice the RAM. At 1 M, SSE halves the instance count. HTTP/2 SSE (6 KB/conn) is even cheaper — at 1 M connections it fits in two instances.
Step 5 — Account for proxy buffering costs specific to SSE Permalink to this section
Proxy buffering is the most common reason SSE latency spikes in production. See the Buffer Management & Chunked Transfer Encoding guide for full detail. In brief, nginx requires explicit configuration:
# nginx — disable buffering for SSE endpoints
location /events {
proxy_pass http://upstream;
proxy_buffering off;
proxy_cache off;
proxy_set_header X-Accel-Buffering no;
proxy_read_timeout 86400s; # 24 h; match your longest acceptable idle connection
chunked_transfer_encoding on;
}
WebSocket connections bypass HTTP buffering entirely because the upgrade switches the connection to a raw TCP tunnel. This means WebSocket latency is immune to proxy buffering, while SSE latency is not unless you configure it explicitly. Factor this operational risk into your choice if you don’t control the proxy layer.
Validation & Monitoring Permalink to this section
Once you’ve made the choice and deployed, verify the trade-offs in production:
# 1. Confirm SSE events are not being buffered — time-to-first-byte should be ~0
curl -v --no-buffer -H "Accept: text/event-stream" https://example.com/events 2>&1 | \
grep -E "< |data:"
# 2. Check active connection count and memory per process (Linux)
ss -s # shows total established TCP connections
cat /proc/$(pgrep -n node)/status | grep -E "VmRSS|Threads"
# 3. Prometheus / OpenTelemetry — emit these metrics from your server
# sse_active_connections (gauge)
# sse_event_dispatch_latency_ms (histogram, buckets: 1,5,10,50,100)
# sse_bytes_sent_total (counter)
// Node.js — minimal per-event latency histogram using perf_hooks
import { performance, PerformanceObserver } from "node:perf_hooks";
export function measureDispatch(res, payload) {
const mark = `sse-dispatch-${Date.now()}`;
performance.mark(mark);
res.write(`data: ${payload}\n\n`);
performance.measure("sse-dispatch", mark);
}
const obs = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.name === "sse-dispatch") {
// push entry.duration to your metrics sink
}
}
});
obs.observe({ entryTypes: ["measure"] });
For the connection-count comparison, track both the WebSocket connection-upgrade counter and SSE response-open counter at your load balancer. Connection-Count Trade-offs: SSE vs WebSockets covers the ALB and nginx metrics to watch.
For event ID and retry validation after reconnect, confirm your server sends Last-Event-ID replay correctly by simulating a disconnect and watching for the header on reconnect.
⚡ Production Directives
- Set
proxy_buffering offandproxy_read_timeout 86400sin nginx for every SSE endpoint — missing this adds 100–400 ms of artificial latency. - Use HTTP/2 end-to-end for SSE if you expect > 5 k concurrent users per origin: you cut file-descriptor count by ~20× and RAM per connection by ~40%.
- Switch to WebSocket only when client-to-server message rate exceeds 1 message per 5 seconds per user — below that threshold the HTTP POST overhead is undetectable to users.
- Model per-connection RAM cost before provisioning: SSE (HTTP/2) at 6 KB/conn vs WebSocket at 22 KB/conn is a 3.6× instance-count difference at 1 M connections.
- Monitor
sse_active_connectionsas a gauge, not a counter — connection leaks are the top production SSE failure mode and this metric exposes them immediately.
Frequently Asked Questions Permalink to this section
Does WebSocket always have lower latency than SSE?
No. For server-to-client delivery the two protocols are within measurement noise of each other (~0.3–0.4 ms on LAN). WebSocket wins only on client-to-server messages, where it saves one full HTTP round-trip (50–200 ms on a real network). If your use case is pure server push — dashboards, AI token streaming, notifications — SSE latency is identical and the simpler infrastructure is the better trade.
How does HTTP/2 change the SSE cost model?
Significantly. With HTTP/1.1, each SSE stream holds one TCP connection and consumes one browser connection slot (browsers limit 6 per origin). With HTTP/2, all SSE streams to the same origin share one TCP connection; the server sees one file descriptor instead of hundreds. Memory per stream drops from ~10–14 KB to ~4–6 KB. If you're on HTTP/1.1 and planning to scale past 10 k concurrent users, enabling HTTP/2 may be a bigger win than switching to WebSocket.
Can I use SSE through a CDN or edge network?
Yes, and this is a genuine SSE advantage. Because SSE is HTTP, CDNs like Cloudflare and Fastly can terminate connections at the edge, reducing latency for globally distributed users. WebSocket connections typically bypass CDN caching and must reach your origin. Edge SSE works best when event streams are per-session (not shared cache), with the CDN acting as a connection-termination proxy that forwards to your origin via HTTP/2.
What is the break-even connection count where WebSocket becomes cheaper than SSE?
There isn't one — SSE is always cheaper or equal in memory cost per connection because it carries less per-connection framework overhead. The cost difference is typically 2–2.5× in favor of SSE at scale. The decision to use WebSocket is never driven by cost; it's driven by the need for low-latency bidirectional messaging.
Do I need sticky sessions with SSE?
Only if your events are generated on-process (in-memory). If you fan out via Redis Pub/Sub or a message broker, any server instance can serve any client and you don't need sticky sessions. WebSocket has the same requirement — the connection state is on one server unless you externalize it. SSE's statelessness (beyond the open response) actually makes it easier to route without stickiness when combined with an external pub/sub layer.