Skip to content

fix: remove drain-state operational fingerprint from /healthz endpoint (PILOT-335)#16

Open
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-335-20260530-120800
Open

fix: remove drain-state operational fingerprint from /healthz endpoint (PILOT-335)#16
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-335-20260530-120800

Conversation

@matthew-pilot
Copy link
Copy Markdown
Collaborator

Summary

PILOT-335: /healthz endpoint exposed drain state to external observers (operational fingerprint).

Root Cause

beacon/server.go:1228-1234ServeHealth() returned different HTTP status codes (200 vs 503) and body text ("ok" vs "unhealthy") depending on internal drain state, allowing an external observer to detect when a beacon enters graceful scale-down.

Fix

  • server.go: /healthz always returns 200 "ok" regardless of healthOk state. SetHealthy() is preserved for internal state tracking.
  • server_test.go: Updated TestHealthEndpoint to verify consistent 200 response after SetHealthy(false).

Verification

  • go build ./...
  • go vet ./...
  • go test -run TestHealthEndpoint

Size

2 files, +7/−11 | matthew-fix (small tier)

🔗 https://vulturelabs.atlassian.net/browse/PILOT-335

…t (PILOT-335)

/healthz now always returns 200 OK regardless of internal drain state,
preventing external observers from detecting graceful-scale-down timing.
SetHealthy() is preserved for internal state tracking.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🤖 Matthew Pilot — CI Status

Check Status
test ✅ pass (1m8s)
codecov/patch ✅ pass
Mergeable ✅ MERGEABLE

Verdict: All checks green. Ready for review.

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🤖 Matthew Pilot — PR Summary

PILOT-335 — Fixes an operational fingerprint in the /healthz endpoint.

What changed

beacon/server.goServeHealth() now always returns 200 OK regardless of whether the server is draining. Previously it returned 503 when unhealthy, which let external observers detect graceful scale-down timing.

Root cause

server.go:1228-1234 leaked internal drain state through HTTP status codes and body text ("ok" vs "unhealthy").

Changes

File Δ
server.go +2/−6 — Always 200 + "ok"
server_test.go +5/−5 — Updated test expectations

Verdict

✅ Builds, vets, and tests pass. 2 files, +7/−11, small tier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant