Skip to content

fix(auth): don't delete cached credentials when token refresh fails#3443

Open
1fanwang wants to merge 1 commit into
flyteorg:masterfrom
1fanwang:keyring-no-premature-delete-on-refresh-failure
Open

fix(auth): don't delete cached credentials when token refresh fails#3443
1fanwang wants to merge 1 commit into
flyteorg:masterfrom
1fanwang:keyring-no-premature-delete-on-refresh-failure

Conversation

@1fanwang

@1fanwang 1fanwang commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Why are the changes needed?

The authenticators delete the cached credentials whenever a token refresh fails, then fall back to a full re-auth. But KeyringStore.store() overwrites in place, so the delete() is redundant — and harmful: a transient refresh failure (a network blip, an IdP hiccup) wipes the last-good tokens, including a still-valid refresh_token, forcing the user through an unnecessary browser re-login.

What changes were proposed in this pull request?

Drop the premature KeyringStore.delete(self._endpoint) from the refresh-failure paths in the PKCE, device-code, and GCP identity-aware-proxy authenticators. Each fall-through still calls store(), so a successful re-auth persists. At the device-flow poll site the surrounding try/except existed only to delete-and-re-raise, so it's removed and exceptions propagate unchanged.

How was this patch tested?

Added test_pkce_refresh_failure_keeps_cached_credentials and tightened test_device_flow_authenticator, both asserting delete isn't called when a refresh fails. Reverting the fix makes both fail with Expected 'delete' to not have been called. Called 1 times.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

The authenticators call KeyringStore.delete(endpoint) when a token refresh
fails, then immediately fall back to a full authorization flow that calls
KeyringStore.store(...). store() overwrites each key in place, so the delete is
redundant -- and harmful: a transient refresh failure wipes the last-good
cached tokens (including a still-valid refresh_token), forcing an unnecessary
browser re-login.

Remove the premature delete() from the PKCE and device-code refresh paths and
from the identity-aware-proxy authenticator. At the device-flow poll site the
surrounding try/except existed only to delete-and-re-raise, so it is dropped.

Signed-off-by: 1fanwang <1fannnw@gmail.com>
@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.47%. Comparing base (0e2463a) to head (ccf65aa).

❗ There is a different number of reports uploaded between BASE (0e2463a) and HEAD (ccf65aa). Click for more details.

HEAD has 64 uploads less than BASE
Flag BASE (0e2463a) HEAD (ccf65aa)
86 22
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3443      +/-   ##
==========================================
- Coverage   84.07%   78.47%   -5.60%     
==========================================
  Files         396      311      -85     
  Lines       31231    26573    -4658     
  Branches     3029     3029              
==========================================
- Hits        26257    20853    -5404     
- Misses       4099     4897     +798     
+ Partials      875      823      -52     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant