fix(dashmate): handle Docker AutoRemove race and trigger CI on dashmate changes#3395
fix(dashmate): handle Docker AutoRemove race and trigger CI on dashmate changes#3395QuantumExplorer wants to merge 2 commits intov3.1-devfrom
Conversation
…dashmate changes The resolveDockerHostIp function uses docker.run() with AutoRemove, which can race on newer Docker versions — the container is removed before dockerode's wait() completes, causing a 404 error. Added a catch handler that treats 404 as success when a valid IP was captured. Also updated CI to trigger Docker image builds, dashmate E2E tests, test suite, and functional tests when dashmate code changes, not just on version bumps or nightly schedule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughEnhances CI workflow to detect changes in the dashmate package and trigger relevant jobs accordingly. Additionally refactors Docker container execution to use separate stdout/stderr streams and implements 404 error handling with IPv4 address validation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/dashmate/src/docker/resolveDockerHostIpFactory.js (1)
44-47: Validate and return the captured IP directly in the 404 handler for clarity.The IP is validated in the catch block but then discarded; the function re-reads it at line 52-58. Consider returning the validated
ipdirectly from the catch to avoid re-parsing and make the intent clearer.♻️ Suggested refactor to return IP from catch
One approach is to store the validated IP in a closure variable:
+ let capturedIp = null; + const [result] = await docker.run( // ... ).catch(async (err) => { if (err.statusCode === 404) { const ip = stdoutStream.toString().trim(); if (ip && /^\d+\.\d+\.\d+\.\d+$/.test(ip)) { + capturedIp = ip; return [{ StatusCode: 0 }]; } } throw err; }); - const output = stdoutStream.toString(); + if (capturedIp) { + return capturedIp; + } + const output = stdoutStream.toString(); if (result.StatusCode !== 0) {This makes it explicit that the 404 path uses the already-validated IP.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/dashmate/src/docker/resolveDockerHostIpFactory.js` around lines 44 - 47, The catch block currently validates the captured ip (variable ip from stdoutStream) but discards it by returning only StatusCode; change the 404/error path in resolveDockerHostIpFactory so it returns the validated ip directly (e.g., include the ip string in the return value instead of only [{ StatusCode: 0 }]) so the later 404 handler can use the already-validated ip without re-parsing; update any consumer of resolveDockerHostIpFactory to read the returned ip field (or tuple) and remove the redundant re-reading/parsing at the later lines (where stdoutStream is re-used).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/dashmate/src/docker/resolveDockerHostIpFactory.js`:
- Line 20: The HostConfig currently removes AutoRemove, causing the Alpine
container started by resolveDockerHostIp to persist and leak resources; fix by
either restoring AutoRemove: true in the hostConfig used when creating the
container inside resolveDockerHostIp, or (preferred) keep AutoRemove omitted and
explicitly call container.remove({ force: true }) (or container.remove()) after
you read stdout and before returning, with a try/catch to ignore 404s/race
conditions; ensure you reference the container instance returned by
docker.createContainer / container.start in resolveDockerHostIp and perform the
cleanup in its finally/after-success path so no dangling containers remain.
- Around line 38-50: The comment in the catch handler for the docker.run call
refers to "AutoRemove" but hostConfig is currently an empty object, creating a
mismatch; either restore AutoRemove in the hostConfig passed to docker.run (e.g.
set AutoRemove: true so the 404 case matches the comment) or update the comment
to accurately describe why a 404/no‑such‑container can occur without AutoRemove
(e.g. container may have exited and been removed by other cleanup logic); locate
the catch block in resolveDockerHostIpFactory.js and either add the AutoRemove
flag to the hostConfig used by docker.run or change the comment text to reflect
the actual behavior.
---
Nitpick comments:
In `@packages/dashmate/src/docker/resolveDockerHostIpFactory.js`:
- Around line 44-47: The catch block currently validates the captured ip
(variable ip from stdoutStream) but discards it by returning only StatusCode;
change the 404/error path in resolveDockerHostIpFactory so it returns the
validated ip directly (e.g., include the ip string in the return value instead
of only [{ StatusCode: 0 }]) so the later 404 handler can use the
already-validated ip without re-parsing; update any consumer of
resolveDockerHostIpFactory to read the returned ip field (or tuple) and remove
the redundant re-reading/parsing at the later lines (where stdoutStream is
re-used).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 94f36b0f-068f-471b-abed-d8a1aa36c766
📒 Files selected for processing (2)
.github/workflows/tests.ymlpackages/dashmate/src/docker/resolveDockerHostIpFactory.js
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3395 +/- ##
=========================================
Coverage 79.93% 79.93%
=========================================
Files 2861 2861
Lines 280230 280230
=========================================
Hits 223993 223993
Misses 56237 56237
🚀 New features to boost your workflow:
|
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The PR has a confirmed container resource leak: AutoRemove was removed without adding manual cleanup, so every call to resolveDockerHostIp() leaves a stopped alpine container. The catch handler becomes dead code without AutoRemove. The correct fix is to restore AutoRemove: true and keep the catch handler. The CI changes are correct and strictly additive.
Reviewed commit: da89173
🔴 1 blocking | 🟡 2 suggestion(s)
1 additional finding
🟡 suggestion: Other callers using AutoRemove: true may hit the same Docker 28.x race
packages/dashmate/src/listr/tasks/setup/regular/importCoreDataTaskFactory.js (line 184)
Three other files use docker.run() with AutoRemove: true and could hit the same 404 race on Docker 28.x:
importCoreDataTaskFactory.js:184VerificationServer.js:100obtainLetsEncryptCertificateTaskFactory.js:196
Consider adding a similar catch handler to these callers as a follow-up, or extracting a shared helper that wraps docker.run() with the AutoRemove race handling.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/dashmate/src/docker/resolveDockerHostIpFactory.js`:
- [BLOCKING] lines 20-50: Removing AutoRemove without manual cleanup leaks stopped containers
`AutoRemove: true` was removed from `hostConfig` (line 20 is now `const hostConfig = {};`) but no manual `container.remove()` was added. The `docker.run()` return value is destructured as `const [result] = ...` (line 29), discarding the container reference entirely. Without `AutoRemove` and without manual removal, every call to `resolveDockerHostIp()` leaves a stopped alpine container on the host.
The correct fix is to **restore `AutoRemove: true`** and keep the catch handler. The catch handler on lines 38-50 already properly handles the Docker 28.x race condition where `wait()` returns 404 because the container was auto-removed before the API call completed. This approach:
1. Prevents container accumulation (AutoRemove cleans up)
2. Handles the race condition (catch handler validates the captured IP)
3. Is consistent with other callers like `importCoreDataTaskFactory.js`, `VerificationServer.js`, and `obtainLetsEncryptCertificateTaskFactory.js` which all use `AutoRemove: true`
- [SUGGESTION] lines 38-50: Catch handler is dead code without AutoRemove — becomes the actual fix once AutoRemove is restored
The `.catch()` handler on lines 38-50 guards against a 404 caused by Docker's AutoRemove racing with `wait()`. The comments explicitly reference this: "docker.run with AutoRemove can race on container cleanup". However, since `AutoRemove` was removed from `hostConfig`, the Docker daemon will never auto-remove the container, so the 404 race cannot occur and this handler is dead code.
Once `AutoRemove: true` is restored (per the blocking finding), this catch handler becomes the actual fix for the race condition and is valuable. No code change needed here if the blocking finding is addressed.
In `packages/dashmate/src/listr/tasks/setup/regular/importCoreDataTaskFactory.js`:
- [SUGGESTION] line 184: Other callers using AutoRemove: true may hit the same Docker 28.x race
Three other files use `docker.run()` with `AutoRemove: true` and could hit the same 404 race on Docker 28.x:
- `importCoreDataTaskFactory.js:184`
- `VerificationServer.js:100`
- `obtainLetsEncryptCertificateTaskFactory.js:196`
Consider adding a similar catch handler to these callers as a follow-up, or extracting a shared helper that wraps `docker.run()` with the AutoRemove race handling.
| const hostConfig = {}; | ||
|
|
||
| if (platform !== 'darwin' && platform !== 'win32' && !isWSL()) { | ||
| hostConfig.ExtraHosts = ['host.docker.internal:host-gateway']; | ||
| } | ||
|
|
||
| const writableStream = new WritableStream(); | ||
| const stdoutStream = new WritableStream(); | ||
| const stderrStream = new WritableStream(); | ||
|
|
||
| const [result] = await docker.run( | ||
| 'alpine', | ||
| [], | ||
| writableStream, | ||
| [stdoutStream, stderrStream], | ||
| { | ||
| Entrypoint: ['sh', '-c', 'ping -c1 host.docker.internal | sed -nE \'s/^PING[^(]+\\(([^)]+)\\).*/\\1/p\''], | ||
| HostConfig: hostConfig, | ||
| }, | ||
| {}, | ||
| ); | ||
|
|
||
| const output = writableStream.toString(); | ||
| ).catch(async (err) => { | ||
| // docker.run with AutoRemove can race on container cleanup | ||
| // If the error is a 404 "no such container", the run succeeded | ||
| // but the container was already removed before wait() completed | ||
| if (err.statusCode === 404) { | ||
| // Container ran and was auto-removed; read what we captured | ||
| const ip = stdoutStream.toString().trim(); | ||
| if (ip && /^\d+\.\d+\.\d+\.\d+$/.test(ip)) { | ||
| return [{ StatusCode: 0 }]; | ||
| } | ||
| } | ||
| throw err; | ||
| }); |
There was a problem hiding this comment.
🔴 Blocking: Removing AutoRemove without manual cleanup leaks stopped containers
AutoRemove: true was removed from hostConfig (line 20 is now const hostConfig = {};) but no manual container.remove() was added. The docker.run() return value is destructured as const [result] = ... (line 29), discarding the container reference entirely. Without AutoRemove and without manual removal, every call to resolveDockerHostIp() leaves a stopped alpine container on the host.
The correct fix is to restore AutoRemove: true and keep the catch handler. The catch handler on lines 38-50 already properly handles the Docker 28.x race condition where wait() returns 404 because the container was auto-removed before the API call completed. This approach:
- Prevents container accumulation (AutoRemove cleans up)
- Handles the race condition (catch handler validates the captured IP)
- Is consistent with other callers like
importCoreDataTaskFactory.js,VerificationServer.js, andobtainLetsEncryptCertificateTaskFactory.jswhich all useAutoRemove: true
💡 Suggested change
| const hostConfig = {}; | |
| if (platform !== 'darwin' && platform !== 'win32' && !isWSL()) { | |
| hostConfig.ExtraHosts = ['host.docker.internal:host-gateway']; | |
| } | |
| const writableStream = new WritableStream(); | |
| const stdoutStream = new WritableStream(); | |
| const stderrStream = new WritableStream(); | |
| const [result] = await docker.run( | |
| 'alpine', | |
| [], | |
| writableStream, | |
| [stdoutStream, stderrStream], | |
| { | |
| Entrypoint: ['sh', '-c', 'ping -c1 host.docker.internal | sed -nE \'s/^PING[^(]+\\(([^)]+)\\).*/\\1/p\''], | |
| HostConfig: hostConfig, | |
| }, | |
| {}, | |
| ); | |
| const output = writableStream.toString(); | |
| ).catch(async (err) => { | |
| // docker.run with AutoRemove can race on container cleanup | |
| // If the error is a 404 "no such container", the run succeeded | |
| // but the container was already removed before wait() completed | |
| if (err.statusCode === 404) { | |
| // Container ran and was auto-removed; read what we captured | |
| const ip = stdoutStream.toString().trim(); | |
| if (ip && /^\d+\.\d+\.\d+\.\d+$/.test(ip)) { | |
| return [{ StatusCode: 0 }]; | |
| } | |
| } | |
| throw err; | |
| }); | |
| const hostConfig = { | |
| AutoRemove: true, | |
| }; |
source: ['claude', 'codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/dashmate/src/docker/resolveDockerHostIpFactory.js`:
- [BLOCKING] lines 20-50: Removing AutoRemove without manual cleanup leaks stopped containers
`AutoRemove: true` was removed from `hostConfig` (line 20 is now `const hostConfig = {};`) but no manual `container.remove()` was added. The `docker.run()` return value is destructured as `const [result] = ...` (line 29), discarding the container reference entirely. Without `AutoRemove` and without manual removal, every call to `resolveDockerHostIp()` leaves a stopped alpine container on the host.
The correct fix is to **restore `AutoRemove: true`** and keep the catch handler. The catch handler on lines 38-50 already properly handles the Docker 28.x race condition where `wait()` returns 404 because the container was auto-removed before the API call completed. This approach:
1. Prevents container accumulation (AutoRemove cleans up)
2. Handles the race condition (catch handler validates the captured IP)
3. Is consistent with other callers like `importCoreDataTaskFactory.js`, `VerificationServer.js`, and `obtainLetsEncryptCertificateTaskFactory.js` which all use `AutoRemove: true`
| ).catch(async (err) => { | ||
| // docker.run with AutoRemove can race on container cleanup | ||
| // If the error is a 404 "no such container", the run succeeded | ||
| // but the container was already removed before wait() completed | ||
| if (err.statusCode === 404) { | ||
| // Container ran and was auto-removed; read what we captured | ||
| const ip = stdoutStream.toString().trim(); | ||
| if (ip && /^\d+\.\d+\.\d+\.\d+$/.test(ip)) { | ||
| return [{ StatusCode: 0 }]; | ||
| } | ||
| } | ||
| throw err; | ||
| }); |
There was a problem hiding this comment.
🟡 Suggestion: Catch handler is dead code without AutoRemove — becomes the actual fix once AutoRemove is restored
The .catch() handler on lines 38-50 guards against a 404 caused by Docker's AutoRemove racing with wait(). The comments explicitly reference this: "docker.run with AutoRemove can race on container cleanup". However, since AutoRemove was removed from hostConfig, the Docker daemon will never auto-remove the container, so the 404 race cannot occur and this handler is dead code.
Once AutoRemove: true is restored (per the blocking finding), this catch handler becomes the actual fix for the race condition and is valuable. No code change needed here if the blocking finding is addressed.
source: ['claude']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/dashmate/src/docker/resolveDockerHostIpFactory.js`:
- [SUGGESTION] lines 38-50: Catch handler is dead code without AutoRemove — becomes the actual fix once AutoRemove is restored
The `.catch()` handler on lines 38-50 guards against a 404 caused by Docker's AutoRemove racing with `wait()`. The comments explicitly reference this: "docker.run with AutoRemove can race on container cleanup". However, since `AutoRemove` was removed from `hostConfig`, the Docker daemon will never auto-remove the container, so the 404 race cannot occur and this handler is dead code.
Once `AutoRemove: true` is restored (per the blocking finding), this catch handler becomes the actual fix for the race condition and is valuable. No code change needed here if the blocking finding is addressed.
The AutoRemove race condition was not the actual issue — the original code works correctly. The setup failure was caused by stale config state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The net PR change is purely CI — adding dashmate-changed triggers to build-images, dashmate-e2e-tests, test-suite, and test-functional jobs in tests.yml. The JS file (resolveDockerHostIpFactory.js) is identical to the base branch after the revert, so there are no application code changes. Codex's blocking finding on the AutoRemove race is a false positive for this review scope — the code is unchanged from production.
Reviewed commit: 3458cec
🟡 1 suggestion(s) | 💬 1 nitpick(s)
2 additional findings
🟡 suggestion: PR title and description no longer match the actual changes
packages/dashmate/src/docker/resolveDockerHostIpFactory.js (lines 1-52)
The PR is titled 'fix(dashmate): handle Docker AutoRemove race condition and run CI on dashmate changes' and the description discusses fixing the (HTTP code 404) no such container error. After the revert in 3458cec, resolveDockerHostIpFactory.js has zero net diff against the base branch — the only remaining change is adding dashmate-changed triggers to tests.yml. The PR title should be updated to something like ci(dashmate): trigger Docker builds and e2e tests on dashmate changes and the description should drop the AutoRemove fix language.
💬 nitpick: AutoRemove race condition is pre-existing, not a regression from this PR
packages/dashmate/src/docker/resolveDockerHostIpFactory.js (lines 20-22)
Both agents flagged the dockerode run() + AutoRemove: true race where container.wait() can 404 if the daemon removes the container before the long-poll connects. This is a real theoretical issue (dockerode#582), but it is pre-existing — this file has zero net diff against the base branch. The commit message explains the original failure was caused by stale config state, not this race. Other callers in the codebase (e.g., VerificationServer.js) do handle this with try/catch, so a future hardening PR could add similar protection here, but that's out of scope for this CI-only change.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `packages/dashmate/src/docker/resolveDockerHostIpFactory.js`:
- [SUGGESTION] lines 1-52: PR title and description no longer match the actual changes
The PR is titled 'fix(dashmate): handle Docker AutoRemove race condition and run CI on dashmate changes' and the description discusses fixing the `(HTTP code 404) no such container` error. After the revert in 3458cec, `resolveDockerHostIpFactory.js` has zero net diff against the base branch — the only remaining change is adding `dashmate-changed` triggers to tests.yml. The PR title should be updated to something like `ci(dashmate): trigger Docker builds and e2e tests on dashmate changes` and the description should drop the AutoRemove fix language.
Summary
(HTTP code 404) no such containererror inresolveDockerHostIpcaused by Docker'sAutoRemoveracing with dockerode'swait()on newer Docker versions (28.x). Catches the 404 and treats it as success when a valid IP was already captured.tests.yml) to trigger Docker image builds, dashmate E2E tests, test suite, and functional tests whenpackages/dashmate/**changes — previously these only ran on version bumps, nightly schedule, or manual dispatch.Test plan
yarn dashmate setup local -c 3succeeds without container 404 errorspackages/dashmate/triggers Docker builds and E2E tests in CI🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Chores