Skip to content

Cap minion AsyncAuth retry loop with auth_tries (#69442)#69443

Open
dwoz wants to merge 1 commit into
saltstack:3006.xfrom
dwoz:fix/issue-69442
Open

Cap minion AsyncAuth retry loop with auth_tries (#69442)#69443
dwoz wants to merge 1 commit into
saltstack:3006.xfrom
dwoz:fix/issue-69442

Conversation

@dwoz

@dwoz dwoz commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Backports the auth_tries outer-loop cap on the minion's
AsyncAuth._authenticate() from 3008.x. When sign_in() keeps returning
the "retry" sentinel, the minion will now bail out of the
authentication loop after auth_tries attempts (default 7) with
SaltClientError("Failed to authenticate with the master after N attempts"), instead of looping silently forever with exponential backoff
up to acceptance_wait_time_max.

auth_tries=0 preserves the legacy "loop forever" behaviour for
operators who explicitly want it. The SAuth.authenticate() synchronous
path is intentionally left unchanged — it is the salt-call / single-shot
CLI codepath and is out of scope for this fix.

What issues does this PR fix or reference?

Fixes #69442

Previous Behavior

On 3006.x and 3007.x, a minion whose sign_in() consistently returns
"retry" (master key not yet accepted, master AES rotation in flight,
multi-master probe against an unreachable peer, etc.) sleeps
acceptance_wait_time between attempts, doubles up to
acceptance_wait_time_max, and never logs an error. The minion appears
stuck with no operator-visible signal.

New Behavior

After auth_tries consecutive "retry" responses, the loop terminates
with a SaltClientError:

salt.exceptions.SaltClientError: Failed to authenticate with the master after 7 attempts

which is then wrapped by salt.channel.client.AsyncPubChannel.connect()
into the user-visible "Unable to sign_in to master: ..." log line. This
matches the behaviour 3008.x has had since the auth_tries cap was
introduced.

Merge requirements satisfied?

  • Docs (no documented behaviour changes; auth_tries is already
    documented and its default of 7 carries over)
  • Changelog (changelog/69442.fixed.md)
  • Tests written/updated
    (tests/pytests/unit/test_crypt.py::test_authenticate_caps_retry_loop_with_auth_tries_69442)

Commits signed with GPG?

No (matching the rest of 3006.x history; let me know if you want this
re-signed.)

The minion's AsyncAuth._authenticate() outer loop on 3006.x and 3007.x
keeps calling sign_in() forever whenever the master answers with the
"retry" sentinel (key not yet accepted, master AES rotation in flight,
multi-master probe). The minion sleeps acceptance_wait_time between
attempts, doubling up to acceptance_wait_time_max, and never surfaces
an error: no log, no traceback, just a stuck minion.

3008.x already caps this loop using the existing auth_tries option
(default 7); backport the same guard so the minion bails out of
_authenticate() with SaltClientError("Failed to authenticate with the
master after N attempts") once auth_tries iterations have been spent
returning "retry". auth_tries=0 keeps the old "loop forever" behavior
for operators who actually want it.

The synchronous SAuth.authenticate() path is intentionally left
unchanged: that is a separate code path used by salt-call and other
single-shot CLI flows, and its existing semantics are out of scope for
this fix.

Fixes saltstack#69442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant