Skip to content

fix(devtools-proxy-support): reconnect SSH tunnel after unrecoverable client error COMPASS-8355#772

Merged
ivandevp merged 5 commits into
mainfrom
COMPASS-8355
Jun 3, 2026
Merged

fix(devtools-proxy-support): reconnect SSH tunnel after unrecoverable client error COMPASS-8355#772
ivandevp merged 5 commits into
mainfrom
COMPASS-8355

Conversation

@ivandevp
Copy link
Copy Markdown
Collaborator

@ivandevp ivandevp commented Jun 2, 2026

Description

Fixes the SSH tunnel failing to recover after a laptop resumes from hibernate (COMPASS-8355).

When the OS hibernates, the underlying TCP connection is killed mid-stream. The ssh2 packet parser receives truncated data and puts the Client instance into a permanently broken "unusable" state. Two problems followed:

  1. There was no sshClient.on('error', ...) handler in SSHAgent, so the 'error' event emitted by the broken parser went completely unhandled → uncaught exception crashing the process.
  2. The existing retry logic in _connect() only matched "Not connected" and "Channel open failure". Even if "Instance unusable after fatal error" had been added to that list, the retry would have called initialize() on the same broken instance → same error again. The ssh2 Client must be recreated before reconnecting.

Changes

createSshClient() (new private method) — extracts ssh2 Client construction and listener setup into a reusable method. Adds a persistent client.on('error', ...) handler that sets connected = false and prevents unhandled error events from crashing the process.

initialize(reinitializeClient = false) — the new flag triggers sshClient.end() + createSshClient() before reconnecting, discarding the broken instance and binding forwardOut to the fresh one.

_connect() — expands retryable error patterns to cover "Instance unusable after fatal error", "read ECONNRESET", and "Socket closed". When requiresNewClient is true, passes the flag through to initialize().

New tests

  • 'does not crash on unexpected sshClient error events' — verifies the new 'error' handler absorbs the event without throwing.
  • 'reconnects with a fresh SSH client after "Instance unusable after fatal error"' — simulates a broken ssh2 client by patching connect() to emit that error, then verifies a successful reconnect with a fresh client (auth handler called twice).

Open Questions

This is step 1 of a broader cleanup (see COMPASS-8355). Step 2 will update devtools-connect to expose the tunnel reference or surface tunnel errors on DevtoolsConnectionState, so that Compass no longer needs to create and manage the tunnel itself.

Checklist

… client error COMPASS-8355

- Add sshClient.on('error') handler in SSHAgent constructor to prevent
  unhandled 'error' events from crashing the process when the SSH session
  dies unexpectedly (e.g. after hibernate)
- Add reinitializeClient flag to initialize() to recreate the ssh2 Client
  instance when it enters an unrecoverable "Instance unusable" state
- Expand retryable error patterns to include "Instance unusable after fatal
  error", "read ECONNRESET", and "Socket closed"
- Extract client setup into createSshClient() to ensure close/error handlers
  and forwardOut binding are consistent on both initial setup and recreation
@ivandevp ivandevp requested a review from addaleax June 2, 2026 14:18
@ivandevp ivandevp self-assigned this Jun 2, 2026
'error',
new Error('some unexpected ssh error'),
);
}).not.to.throw();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this test, but I think a much more valuable test would be one in which the actual connection is interrupted – and luckily, the setup here does allow us to test this directly, because HTTPServerProxyTestSetup holds a reference to an SSH server, and that SSH server forwards getConnections to the underlying net.Server, and we can call .destroy() on those connections.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test added :)

Comment thread packages/devtools-proxy-support/src/ssh.ts Outdated
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 2, 2026

Coverage Status

coverage: 78.676% (+0.05%) from 78.63% — COMPASS-8355 into main

ivandevp added 4 commits June 2, 2026 12:53
- Replace delete this.connectingPromise with = undefined assignment
- Improve "Instance unusable" test: patch connect() to simulate broken
  client state, verify fresh client is created and auth happens twice
- Track initializedSuccessfully in _connect() to detect when initialize()
  itself failed, requiring a new client on retry
… to clear keepalive timer COMPASS-8355

Without this, the old ssh2 Client's keepalive setInterval keeps firing
after initialize(true) replaces it, preventing process exit. ssh2's end()
guards with isWritable() so it is a no-op when the socket is already dead.
@ivandevp ivandevp requested a review from addaleax June 3, 2026 13:19
@ivandevp ivandevp merged commit 629c571 into main Jun 3, 2026
35 checks passed
@ivandevp ivandevp deleted the COMPASS-8355 branch June 3, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants