Skip to content

fix(deploy): nil pointer in updateProcessGroup error path#4931

Open
Lucais11 wants to merge 1 commit into
masterfrom
fix/deploy-update-process-group-nil-deref
Open

fix(deploy): nil pointer in updateProcessGroup error path#4931
Lucais11 wants to merge 1 commit into
masterfrom
fix/deploy-update-process-group-nil-deref

Conversation

@Lucais11
Copy link
Copy Markdown
Contributor

@Lucais11 Lucais11 commented May 26, 2026

Fixes #4528
When fly deploy creates a brand new machine (scale-up) and that machine fails to come up. For example the API rejects the create, lease acquisition fails, or smoke checks don't pass. flyctl crashes with a nil pointer dereference panic instead of returning a clear error. Users then see a Go stack trace and a deploy left in an inconsistent state.

The error formatting line in updateProcessGroup tries to print "failed to update machine : " by reading oldMachine.ID. For a scale-up there is no old machine — oldMachine is nil — so reading .ID panics on the exact line that was supposed to report the error. Three lines above, the code already computes a machineID variable that handles both cases (existing machine → use the old ID; new machine → use the new ID). The fix is to use that variable.

updateProcessGroup historically had two unsafe oldMachine.ID reads. #4180 fixed the first (the health-check lookup), which made the common scale-up path stop panicking. This PR closes the second (the error path), which still panics any time a scale-up hits a downstream error. Same shape of bug as #4750, one function over.

Also adds TestUpdateProcessGroupScaleUpErrorDoesNotPanic, which reproduces the exact panic from #4528 when the fix is reverted and passes aft

When updateMachineWChecks returns an error for a scale-up pairing
(oldMachine=nil), the error-formatting line dereferenced oldMachine.ID
and panicked inside an errgroup goroutine, crashing the deploy. Use the
locally-scoped machineID variable that already handles the nil-oldMachine
case.

The originally-reported panic in #4528 (at the now-removed
healthChecksPassed.Load(machPair.oldMachine.ID) site) was patched in
#4180; this closes the remaining oldMachine.ID deref in the same
function. Sibling fix to #4750.

Fixes #4528
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nil pointer deference when updating process group during deploy

2 participants