Tuning File-Descriptor Limits for SSE Connection Pools Permalink to this section

Part of Connection Pooling for SSE Servers.

Each persistent SSE connection holds one open file descriptor for the TCP socket and, depending on your architecture, one more for a backing pipe, epoll slot, or Redis subscriber socket. On a default Linux install the per-process limit is 1 024. At ~800 concurrent clients your server silently starts rejecting new connections with Error: EMFILE: too many open files (Node.js), OSError: [Errno 24] Too many open files (Python), or a connection-reset with no log entry at all. This guide walks through every layer where the limit is enforced—kernel, shell session, systemd unit, and container runtime—and gives you copy-paste commands to raise it correctly and verify the result.

Symptom & Developer Intent Permalink to this section

You are running an SSE connection pool and notice one of these failure modes once active connections cross a threshold:

  • Node.js / libuv: Error: EMFILE: too many open files, accept in stderr or your process monitor.
  • Python (FastAPI / Starlette): OSError: [Errno 24] Too many open files thrown inside asyncio’s event loop.
  • Go / net/http: accept tcp: accept4: too many open files returned from net.Listen.
  • Nginx upstream: (24: Too many open files) while connecting to upstream in error.log.
  • Silent drops: clients receive a connection-reset or the browser EventSource immediately fires onerror and retries, never establishing a stream.

The intent is to support N concurrent SSE connections—where N might be 5 000, 50 000, or 500 000—without EMFILE/ENFILE errors and without restarting the process.

Root Cause Analysis Permalink to this section

How Linux accounts for file descriptors Permalink to this section

Every open socket, file, pipe, or epoll file descriptor increments two counters:

Counter Scope Default
RLIMIT_NOFILE (soft) Per process 1 024
RLIMIT_NOFILE (hard) Per process ceiling 4 096 (varies by distro)
fs.file-max System-wide open FDs ~800 000 on modern kernels
fs.nr_open Per-process kernel ceiling 1 048 576

An SSE server that holds C concurrent connections consumes at minimum C sockets. Add the listening socket, a Redis pub/sub connection per worker, any log-file handles, and your TLS contexts, and a realistic overhead per connection is 1.2–1.5 FDs on average. A process with nofile=1024 therefore caps out around 680–800 live SSE streams.

Why raising it in your shell is not enough Permalink to this section

ulimit -n 65535 in a terminal only raises the soft limit for that shell session and its children. When a process manager (systemd, Docker, Kubernetes, PM2) forks your server, it inherits the limits configured in its own unit file or runtime spec—not your interactive shell. This is the single most common reason engineers raise the limit and still see EMFILE errors.

The kernel’s system-wide ceiling Permalink to this section

fs.file-max caps the total number of open FDs across all processes. On most modern kernels the default is large enough (often cat /proc/sys/fs/file-max returns 9–12 million), but a constrained VPS or container base image may set it much lower. If you are scaling past 100 000 concurrent SSE connections you need to verify this as well.

Step-by-Step Resolution Permalink to this section

Step 1 — Diagnose current limits Permalink to this section

Run these as the user that owns the server process:

# Soft and hard limit for the current shell
ulimit -Sn   # soft (enforced)
ulimit -Hn   # hard (ceiling you can raise to without root)

# Limits of a running process (replace PID)
cat /proc/$(pgrep -f "node server")/limits | grep "open files"

# System-wide current usage
sysctl fs.file-nr          # used / free / max
cat /proc/sys/fs/file-max  # absolute kernel ceiling

Step 2 — Raise the limit for an interactive / dev session Permalink to this section

# Raise soft limit to 65 535 for this shell (requires hard limit >= target)
ulimit -Sn 65535

# If the hard limit is too low, you need root:
sudo sh -c 'ulimit -Hn 1048576; exec su - youruser'

# Verify
ulimit -n   # should print 65535

Step 3 — Set persistent per-user limits via /etc/security/limits.conf Permalink to this section

This applies to PAM-authenticated logins (SSH, console). It does not apply to systemd services.

# /etc/security/limits.conf  (append or replace existing nofile lines)
*    soft  nofile  65535
*    hard  nofile  1048576

For the change to apply to your current session, log out and back in, or use pam_limits directly:

sudo sysctl -w fs.file-max=2097152   # temporary system-wide raise

Step 4 — Configure systemd service units Permalink to this section

This is the correct place to set limits for any process launched by systemd. Edit your service file directly or use a drop-in:

# Create a drop-in override (preferred — survives package updates)
sudo mkdir -p /etc/systemd/system/my-sse-server.service.d/
sudo tee /etc/systemd/system/my-sse-server.service.d/limits.conf <<'EOF'
[Service]
LimitNOFILE=1048576
EOF

# Reload and restart
sudo systemctl daemon-reload
sudo systemctl restart my-sse-server

# Verify the running process picked up the new limit
systemctl show my-sse-server | grep LimitNOFILE
# → LimitNOFILE=1048576
cat /proc/$(systemctl show --property MainPID --value my-sse-server)/limits \
  | grep "open files"

LimitNOFILE in systemd sets both soft and hard to the same value; specify LimitNOFILE=soft:hard (e.g., 65535:1048576) if you want them to differ.

Step 5 — Tune kernel sysctl parameters Permalink to this section

# /etc/sysctl.d/99-sse-fds.conf
fs.file-max = 2097152        # system-wide ceiling
fs.nr_open  = 1048576        # per-process kernel ceiling (must be <= file-max)

# TCP socket tunables that affect connection lifecycle
net.ipv4.tcp_fin_timeout      = 15   # reduce TIME_WAIT duration
net.ipv4.tcp_tw_reuse         = 1    # reuse TIME_WAIT sockets
net.core.somaxconn            = 4096 # listen backlog
net.ipv4.tcp_max_syn_backlog  = 8192

# Apply immediately without reboot
sudo sysctl --system

Verify:

sysctl fs.file-max fs.nr_open

Step 6 — Docker / OCI container runtime Permalink to this section

Docker inherits nofile from the host kernel’s defaults, not your systemd service. Set it explicitly:

# docker run
docker run \
  --ulimit nofile=1048576:1048576 \
  my-sse-server:latest

# docker-compose.yml
services:
  sse-server:
    image: my-sse-server:latest
    ulimits:
      nofile:
        soft: 65535
        hard: 1048576

For the Docker daemon itself (affects all containers on the host):

// /etc/docker/daemon.json
{
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 1048576,
      "Soft": 65535
    }
  }
}

Restart the daemon after editing: sudo systemctl restart docker.

Step 7 — Kubernetes Permalink to this section

In Kubernetes, ulimit is set at the node level (via the container runtime) or through a privileged init container. The recommended approach is to set limits in the Pod Security Admission baseline or a custom init container:

# pod-spec fragment
initContainers:
- name: set-ulimits
  image: busybox
  command: ["sh", "-c", "ulimit -n 1048576"]
  securityContext:
    privileged: true

For node-level settings, configure containerd or CRI-O via their config files, or use a DaemonSet that writes to /proc/sys:

# DaemonSet container command (requires privileged)
command: ["sysctl", "-w", "fs.file-max=2097152"]

Step 8 — Application-layer guard (Node.js example) Permalink to this section

Even after raising OS limits, guard against misconfigured environments at startup:

// startup-check.js  (ESM)
import { execSync } from "node:child_process";

const MAX_REQUIRED = 65_535;

function checkFdLimit() {
  try {
    // /proc/self/limits is Linux-only; skip gracefully on other OSes
    const raw = execSync("cat /proc/self/limits").toString();
    const match = raw.match(/Max open files\s+(\d+)/);
    const soft = match ? parseInt(match[1], 10) : Infinity;
    if (soft < MAX_REQUIRED) {
      console.error(
        `[FATAL] nofile soft limit is ${soft}; need >= ${MAX_REQUIRED}. ` +
        `Add LimitNOFILE=${MAX_REQUIRED} to your systemd unit.`
      );
      process.exit(1);
    }
    console.log(`[OK] nofile soft limit: ${soft}`);
  } catch (_) {
    // Non-Linux; skip
  }
}

checkFdLimit();

Validation & Monitoring Permalink to this section

Check limits took effect Permalink to this section

# For a running PID
PID=$(pgrep -f "node server.js")
grep "open files" /proc/$PID/limits
# Max open files            65535                1048576              files

# Count currently open FDs for that process
ls -1 /proc/$PID/fd | wc -l

Load-test to confirm headroom Permalink to this section

# Install wrk or use the built-in approach with curl + parallel
# Open 2000 simultaneous SSE connections and hold for 30 s
seq 1 2000 | xargs -P 2000 -I{} \
  curl -s -N -H "Accept: text/event-stream" \
  http://localhost:3000/events > /dev/null &

# While that runs, watch FD consumption
watch -n1 "ls -1 /proc/$(pgrep -f 'node server')/fd | wc -l"

Prometheus / metrics Permalink to this section

If your server exposes metrics, track these gauges:

# Example: expose current FD count from Node.js
import { readFileSync } from "node:fs";

function openFdCount(pid = process.pid) {
  try {
    return readFileSync(`/proc/${pid}/fd`, { withFileTypes: true }).length;
  } catch (_) {
    // /proc/PID/fd requires the same uid or root
    return -1;
  }
}

// Register as a Prometheus gauge and scrape via /metrics

Key alert thresholds: warn at 70 % of nofile soft limit; page at 90 %. This gives headroom for reconnection storms when clients retry after a deploy. For more on managing reconnect bursts, see Rate Limiting & Backpressure Handling and Event ID & Retry Mechanism Design.

System-wide FD exhaustion check Permalink to this section

# columns: allocated / free / max
cat /proc/sys/fs/file-nr
# e.g.: 14368  0  2097152
# "free" is always 0 on modern kernels (not a concern)

Alert if allocated / max > 0.8.

Verification Checklist Permalink to this section

⚡ Production Directives

  • Set LimitNOFILE=1048576 in every systemd unit that runs an SSE server — this is the single highest-impact change and cannot be replaced by /etc/security/limits.conf.
  • Add a startup FD-limit guard that exits non-zero when the soft limit is below your minimum; catch misconfigured deploys before they reach production traffic.
  • Alert at 70 % of the soft limit on a per-process gauge; reconnection storms after a rolling restart can momentarily double active connections.
  • Tune net.ipv4.tcp_fin_timeout=15 and tcp_tw_reuse=1 alongside FD limits — sockets stuck in TIME_WAIT still consume FDs even after the application closes them.
  • In Docker/Kubernetes, set ulimits at the container level; host-level daemon.json defaults are overridden per-container and should be set both places as defence-in-depth.

Frequently Asked Questions Permalink to this section

Why does raising ulimit in my shell not fix the EMFILE error in production?

Shell ulimit changes apply only to that shell session and its direct children. Systemd, Docker, and Kubernetes all fork processes with limits inherited from their own configuration, not your interactive session. The correct fix depends on the supervisor: LimitNOFILE in a systemd unit, --ulimit in docker run, or a node-level containerd setting in Kubernetes.

How many file descriptors does one SSE connection actually use?

One TCP socket = one FD. If your SSE architecture subscribes each connection to a Redis channel directly, add one more FD per connection for the Redis socket (though a shared pub/sub fan-out pattern avoids this — see Redis Pub/Sub Fan-Out for SSE). Add ~10–20 FDs for the listening socket, TLS contexts, log files, and internal pipes. Budget 1.2–1.5 FDs per connection for sizing.

What is the maximum practical value for LimitNOFILE?

The kernel's per-process ceiling is fs.nr_open, which defaults to 1 048 576 (2^20). You can raise fs.nr_open up to fs.file-max. In practice, 1 048 576 is the correct production target for high-concurrency SSE servers; setting it higher requires changing fs.nr_open first, which is rarely necessary unless you exceed ~800 000 concurrent connections on a single process.

Do I need to change fs.file-max as well as LimitNOFILE?

Only if your system-wide total (across all processes) approaches the fs.file-max value shown in /proc/sys/fs/file-nr. On a dedicated SSE server this is uncommon until you exceed ~500 000 connections. Run cat /proc/sys/fs/file-nr to check; if allocated / max > 0.5, raise fs.file-max in /etc/sysctl.d/.

My Go server hits EMFILE but the Node.js server on the same box does not — why?

Go's net/http server opens the listening socket, one goroutine stack (not an FD), and one FD per accepted connection — the same as Node.js. The discrepancy is usually the Go process running under a different systemd unit or user with a stricter LimitNOFILE, or the Go binary calling setrlimit at startup with a hardcoded value. Check /proc/$(pgrep -f mygoserver)/limits and compare against the Node.js process.