Guard nextWrite against null socket after async close#1168
Open
buttilda wants to merge 1 commit into
Open
Conversation
Author
|
Hey @porsager, any feedback here? We have a |
4 tasks
littledivy
added a commit
to denoland/deno
that referenced
this pull request
Jun 5, 2026
… close (#34852) ## Summary Adds a spec regression test that pins down `setImmediate` / `clearImmediate` semantics around a TCP socket's `'close'` event. The test directly mirrors the write-batching pattern that surfaces in the crash reported in #34667 (and the duplicate trail of #29262 sightings against `npm:postgres`): ```js function write(x) { chunk = chunk ? Buffer.concat([chunk, x]) : Buffer.from(x) if (nextWriteTimer === null) nextWriteTimer = setImmediate(nextWrite) } function nextWrite() { socket.write(chunk, fn) // ← crashes when socket === null nextWriteTimer !== null && clearImmediate(nextWriteTimer) chunk = nextWriteTimer = null } function closed() { clearImmediate(nextWriteTimer) socket = null } ``` ## Investigation I reproduced the exact stack trace from #34667 with a minimal pure-`node:net` script (no `npm:postgres` needed) and confirmed it crashes **identically on both Deno and Node.js 24.15.0** — same stack, same `TypeError: Cannot read properties of null (reading 'write')` from inside `Immediate.callback`. So the symptom is not Deno-specific; it's a userland race in postgres-js's connection-pool reuse path that surfaces on every libuv-compatible runtime. The root cause (already triaged upstream as porsager/postgres#1066 and #1154): 1. The slow query's `write()` queues `setImmediate(nextWrite)`. That immediate runs **once**, in the check phase of the same tick — long before any server-side close arrives. `nextWriteTimer` is reset to `null` inside the immediate. 2. The server kills the backend. EOF arrives, the socket destroys, the close callback emits `'close'`. postgres-js's `closed()` calls `clearImmediate(nextWriteTimer)` — but the timer is already `null`, so it's a no-op — then sets `socket = null`. 3. The slow query's promise rejects. The `await slow` continuation runs in the microtask drain that follows the close handler, then the user calls `conn.release()` → which moves the now-dead connection into the pool's `open` queue → `sql.reserve()` shifts it back out → the next `INSERT` triggers `c.execute(q)` → `write(insertBytes)` → **a fresh** `setImmediate(nextWrite)` is queued. 4. That fresh immediate fires on the next tick. By then `socket === null`, so `socket.write(chunk, fn)` throws — and because the throw is inside a check-phase callback, no surrounding `try/catch` in user code can catch it. The process exits. Step 3 is the actual bug. The upstream fix is porsager/postgres#1168, which adds a `socket && socket.write(chunk, fn)` guard inside `nextWrite`. It's been open since 2026-05-25 awaiting maintainer review, with at least one user already applying it as a `pnpm patch` to keep their service stable. ## What this PR adds A spec test under `tests/specs/node/clear_immediate_socket_close_race/` that pins down four invariants Deno already satisfies, so we never silently regress against the actual contract reporters were expecting: 1. `clearImmediate(t)` called synchronously cancels `t`. 2. `clearImmediate(t)` called from a microtask before any check phase runs cancels `t`. 3. When a `setImmediate` is queued in the same tick that the socket is torn down, libuv's check phase runs before the close phase — so the queued immediate fires first and the close handler's `clearImmediate` is a no-op, identical to Node.js. (The reporter assumed the opposite order, which is the heart of the misdiagnosis.) 4. With a `socket && …` guard inside `nextWrite` (matching upstream #1168), the pool-reuse-after-close pattern stops crashing on Deno. ## What this PR does NOT fix The actual crash for users still on `postgres@3.4.9` (or anything pre-#1168). That fix has to land in `porsager/postgres` and propagate via an npm release — Deno can't patch user libraries in-tree. Pointing the issue at upstream so users can apply the `pnpm patch` workaround in the meantime. ## Test plan - [x] `tests/specs/node/clear_immediate_socket_close_race/` passes locally. - [x] Removing the `socket && …` guard from case 4 reproduces the exact `Uncaught TypeError: Cannot read properties of null (reading 'write')` stack from the issue. - [x] The same un-guarded script crashes identically on Node.js 24.15.0, confirming this is not a Deno-specific bug. - [x] `deno fmt` + `deno lint` clean on the new file. Refs #34667 Refs porsager/postgres#1168 Closes denoland/divybot#465 Co-authored-by: divybot <divybot@users.noreply.github.com> Co-authored-by: Divy Srivastava <me@littledivy.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
nextWritecan be scheduled viasetImmediateand fire afterclosed()has nulledsocket, crashing the whole process with:Fixes #1066, #1154