Skip to content

feat(core,client): restore OAuth error subclasses and transient-error classification#2242

Draft
felixweinberger wants to merge 5 commits into
mainfrom
fweinberger/oauth-error-compat
Draft

feat(core,client): restore OAuth error subclasses and transient-error classification#2242
felixweinberger wants to merge 5 commits into
mainfrom
fweinberger/oauth-error-compat

Conversation

@felixweinberger
Copy link
Copy Markdown
Contributor

@felixweinberger felixweinberger commented Jun 2, 2026

Restore OAuth error backwards compatibility: deprecated v1 subclasses, instanceof-compatible construction, and an isTransientOAuthError() helper that preserves v1 retry semantics for unknown error codes.

Motivation and Context

This fixes two silent backwards-compatibility breaks that v1→v2 migrations hit in OAuth error handling. Both were observed while migrating a large production MCP host application (client + server + custom transports) from SDK v1 to v2.

Break 1 — instanceof classification stops matching. v1 exposed a subclass per OAuth error code, and client code classified errors with instanceof:

// v1 — common pattern for token refresh handling
if (error instanceof InvalidGrantError) dropStoredTokens();   // re-auth needed
else if (error instanceof ServerError) scheduleRetry();       // transient

In v2 today these classes don't exist; every check silently stops matching (no compile error in loosely typed paths, and @modelcontextprotocol/server-legacy/auth classes are different classes, so importing them doesn't help against errors thrown by the v2 client). Token-refresh invalidation and retry classification degrade silently.

Break 2 — unknown error codes silently lose retry behavior. v1's parseErrorResponse collapsed unrecognized error codes into ServerError — making them transient/retryable. v2 preserves the raw code. That's strictly better data, but it means the "obvious" migration rewrite:

// before (v1)            // after — looks equivalent, ISN'T
instanceof ServerError  error.code === OAuthErrorCode.ServerError

silently drops retries for non-standard codes that real authorization servers return (e.g. invalid_refresh_token). A refresh loop that recovered in v1 dead-ends in v2.

This PR makes the v1 patterns keep working while keeping v2's better model (raw codes preserved, single base class, error.code as the primary API).

What changed

  • The 17 v1 subclasses + CustomOAuthError are exported again as thin @deprecated wrappers around OAuthError (constructor signature unchanged from v1: (message, errorUri?)).
  • OAuthError.fromResponse() / parseErrorResponse() and the client throw sites construct the matching subclass via a new oauthErrorFromCode(code, message, errorUri?), so instanceof works for SDK-produced errors. Unknown codes still produce a plain OAuthError with the raw code preserved.
  • New isTransientOAuthError(error): true for server_error, temporarily_unavailable, too_many_requests, and any code not in OAuthErrorCode — documented as the v1-parity replacement for instanceof ServerError retry checks.
  • Deprecated errorCode getter alias for code (v1 property name).
  • Deprecated OAUTH_ERRORS code→class map restored.
  • One exception: the OAuth InvalidRequestError could not keep its v1 name — v2 already exports a JSON-RPC InvalidRequestError interface from the protocol types — so it returns as OAuthInvalidRequestError (documented in both migration guides).
  • error.name stays 'OAuthError' for all subclasses (matching the v2 base class), so v2-style error.name checks are unaffected; this differs from v1, which used the subclass name, and is called out in the docs.

How Has This Been Tested?

  • New unit tests: subclass construction/codes, factory known/unknown codes, fromResponse subclass behavior, errorCode alias, OAUTH_ERRORS completeness against the enum, isTransientOAuthError truth table including the unknown-code case, and parseErrorResponse end-to-end (JSON error body → subclass; unparsable body → ServerError, matching v1).
  • Token-endpoint round-trip tests: refreshAuthorization driven with an injected fetchFn returning real error Responses, asserting the consumer-visible thrown class and transient classification through the full request path (known code → InvalidGrantError; unknown invalid_refresh_token → base OAuthError, transient; 503 temporarily_unavailableTemporarilyUnavailableError).
  • Mutation-checked: disabling the subclass factory, the unknown-code transient rule, the fromResponse wiring, or the unparsable-body fallback each makes at least one test fail.
  • auth()-level fallback tests: a refresh failure with an unknown token-endpoint code falls through to a fresh authorization flow (returns REDIRECT), while a known non-transient code (invalid_scope) still escalates to the caller — pinning the v1-parity semantics inside the SDK's own flow, not just in the helper.
  • oauthErrorFromCode is guarded against prototype-chain lookups (a server returning constructor, __proto__, or toString as the error code gets a plain OAuthError with the raw code preserved), with tests for each.
  • Full core (570) and client (375) test suites pass; typecheck and lint clean.
  • The patterns covered here are taken directly from a real v1→v2 migration of a production MCP host application, where both breaks were caught only by behavioral review — neither produces a type error.

Breaking Changes

None. All additions are backwards compatible; v2 code written against OAuthError/error.code is unaffected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

  • The subclasses are frozen compatibility surface: no new fields, removal targeted for v3 (same lifecycle as other deprecated v1 shims).
  • isTransientOAuthError deliberately classifies unknown codes as transient. That mirrors v1's collapse-to-ServerError behavior and errs toward retrying; consumers who want strict RFC-only transience can check the three codes directly.
  • server-legacy/auth error classes remain distinct types (frozen v1 copies for the legacy AS router); the migration guide now warns against mixing them with @modelcontextprotocol/client instanceof checks.

@felixweinberger felixweinberger requested a review from a team as a code owner June 2, 2026 09:26
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 2, 2026

🦋 Changeset detected

Latest commit: 3ea50ff

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@modelcontextprotocol/core Minor
@modelcontextprotocol/client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 2, 2026

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/client@2242

@modelcontextprotocol/codemod

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/codemod@2242

@modelcontextprotocol/server

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server@2242

@modelcontextprotocol/server-legacy

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server-legacy@2242

@modelcontextprotocol/express

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/express@2242

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/fastify@2242

@modelcontextprotocol/hono

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/hono@2242

@modelcontextprotocol/node

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/node@2242

commit: 3ea50ff

Comment on lines +351 to 366
/**
* Returns `true` when an OAuth error indicates a transient condition that may succeed on
* retry: `server_error`, `temporarily_unavailable`, `too_many_requests` — or any error code
* not defined in {@linkcode OAuthErrorCode}.
*
* Treating unknown codes as transient matches SDK 1.x, which collapsed unrecognized error
* codes into `ServerError`. Authorization servers in the wild return non-standard codes
* (e.g. `invalid_refresh_token`) for conditions that are not permanent; treating them as
* permanent failures would stop retry/refresh loops that previously recovered.
*/
export function isTransientOAuthError(error: unknown): boolean {
if (!(error instanceof OAuthError)) {
return false;
}
return TRANSIENT_OAUTH_ERROR_CODES.has(error.code) || !KNOWN_OAUTH_ERROR_CODES.has(error.code);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The SDK's own auth() refresh-failure fallback in packages/client/src/client/auth.ts (lines 780–788) still keys on error.code === OAuthErrorCode.ServerError, so an unknown token-endpoint code like invalid_refresh_token — which this PR's own tests assert is now thrown as a plain OAuthError with the raw code and classified transient — is re-thrown instead of falling through to a fresh authorization, exactly the "refresh loop that recovered in v1 dead-ends in v2" break (Break 2) the PR description says it fixes. Apply isTransientOAuthError(error) (or equivalent unknown-code handling) at that internal call site, or explicitly scope the internal flow out of the PR's v1-parity claim.

Extended reasoning...

What the bug is. The PR adds isTransientOAuthError() to restore v1 retry semantics for unknown OAuth error codes, but the SDK's own authInternal() refresh-failure handling is not updated to use it. At packages/client/src/client/auth.ts:780-788 the catch reads:

} catch (error) {
    // If this is a ServerError, or an unknown type, log it out and try to continue. Otherwise, escalate so we can fix things and retry.
    if (!(error instanceof OAuthError) || error.code === OAuthErrorCode.ServerError) {
        // Could not refresh OAuth tokens
    } else {
        // Refresh failed for another reason, re-throw
        throw error;
    }
}

The comment's intent ("or an unknown type … try to continue") held in SDK 1.x because parseErrorResponse collapsed unrecognized token-endpoint codes into ServerError. In v2 the raw code is preserved, so an unknown code matches neither swallow condition and is re-thrown.

Code path that triggers it. auth()authInternal() finds stored tokens with a refresh_token → calls refreshAuthorization() → token endpoint returns {"error": "invalid_refresh_token"}executeTokenRequest throws via parseErrorResponse → with this PR, oauthErrorFromCode('invalid_refresh_token', …) produces a plain OAuthError carrying the raw code (the new authErrorCompat.test.ts test "preserves unknown token-endpoint error codes and classifies them transient" asserts exactly this) → the catch at line 782 sees error instanceof OAuthError === true and code !== 'server_error', so it re-throws → the outer auth() wrapper at auth.ts:553-563 only recovers for InvalidClient/UnauthorizedClient/InvalidGrant, so the unknown-code error propagates to the caller and the auth flow dead-ends instead of falling through to startAuthorization()REDIRECT.

Step-by-step proof (v1 vs v2 with this PR). Suppose a client has stored tokens whose refresh token was rotated server-side, and the AS returns HTTP 400 {"error": "invalid_refresh_token"}:

  1. SDK 1.x: parseErrorResponse doesn't recognize the code → constructs ServerError → the equivalent catch (error instanceof ServerError) swallows it → authInternal continues to startAuthorization → returns REDIRECT → the user re-authorizes and the client recovers.
  2. v2 with this PR: oauthErrorFromCode('invalid_refresh_token', …) returns a plain OAuthError with code === 'invalid_refresh_token' → line 782: error instanceof OAuthError is true, code === OAuthErrorCode.ServerError is false → else branch re-throws → auth()'s outer catch matches none of invalid_client/unauthorized_client/invalid_grant → the error escapes auth()handleOAuthUnauthorized / the transport surfaces a failure instead of redirecting to re-authorization.
  3. Meanwhile isTransientOAuthError(error) would have returned true for this exact error — the PR's tests assert it — but nothing in the SDK's own flow consults it.

Why existing code doesn't prevent it. The new helper, the subclass factory, and the docs (the "UNKNOWN-CODE TRAP" section in migration-SKILL.md) all target userland migrations. The PR even modifies other throw sites in this same auth.ts file (switching them to oauthErrorFromCode), but the one internal consumer where v1's collapse-to-ServerError semantics actually mattered keeps the strict code === ServerError check, so the restored retry semantics never apply to the SDK's own orchestration.

Impact. Any MCP client using the SDK's auth() orchestrator against an authorization server that returns non-standard refresh-failure codes (e.g. invalid_refresh_token, observed from real-world AS implementations per the PR description) will hard-fail the auth flow once the refresh token goes stale, instead of transparently falling back to a new authorization — the precise silent regression ("Break 2") this PR is motivated by.

How to fix. Replace the condition at line 782 with the new classification, e.g. if (!(error instanceof OAuthError) || isTransientOAuthError(error) || !KNOWN_OAUTH_ERROR_CODES.has(error.code)) — or more simply if (!(error instanceof OAuthError) || isTransientOAuthError(error)), which swallows server_error/temporarily_unavailable/too_many_requests/unknown codes and still escalates invalid_grant/invalid_client/unauthorized_client to the outer auth() recovery logic. Alternatively, if keeping the internal flow strict is intentional, the PR description and migration docs should narrow the claim that v1 retry semantics are restored, since they currently are restored only for consumers who adopt the helper themselves.

Note on scope. Lines 780–788 are pre-existing v2 code not touched by this diff, so this is not a regression introduced by the PR; it is flagged because the PR adds the exact classification helper for this scenario, modifies sibling throw sites in the same file, and explicitly claims to fix this break — leaving the SDK's own call site as the one place the fix doesn't reach.

Comment thread docs/migration-SKILL.md
Comment on lines +195 to +208
non-standard codes (e.g. `invalid_refresh_token`). Use `isTransientOAuthError(error)` instead — it returns true for `server_error`, `temporarily_unavailable`, `too_many_requests`, AND any code not in `OAuthErrorCode` (v1 parity).

The variadic `.tool()`, `.prompt()`, `.resource()` methods are removed. Use the `register*` methods with a config object.
```typescript
// v1
if (error instanceof ServerError) {
scheduleRetry();
}

**IMPORTANT**: v2 requires schema objects implementing [Standard Schema](https://standardschema.dev/) — raw shapes like `{ name: z.string() }` are no longer supported. Wrap with `z.object()` (Zod v4), or use ArkType's `type({...})`, or Valibot. For raw JSON Schema, wrap with
`fromJsonSchema(schema)` from `@modelcontextprotocol/server` (validator defaults automatically; pass an explicit validator for custom configurations). Applies to `inputSchema`, `outputSchema`, and `argsSchema`.
// v2 — equivalent semantics
import { isTransientOAuthError } from '@modelcontextprotocol/client';
if (isTransientOAuthError(error)) {
scheduleRetry();
}
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The OAuth-section rewrite in docs/migration-SKILL.md accidentally deleted unrelated migration content: the ## 6. McpServer API Changes heading, the variadic .tool()/.prompt()/.resource() removal note, the "IMPORTANT: v2 requires Standard Schema objects (wrap with z.object())" paragraph, and the "Unchanged APIs" paragraph. The file's numbering now jumps from section 5 to section 7, the ### Tools/### Prompts/### Resources subsections are orphaned, and step 4 of section 15 still references the now-missing "section 6" — please restore the deleted heading and paragraphs alongside the new OAuth content.

Extended reasoning...

What happened: The diff hunk in docs/migration-SKILL.md (lines ~178–230) replaces the old OAuth-error table with the new compatibility-focused prose. But the deleted span extends past the OAuth section boundary: it also removes (1) the **Unchanged APIs** (only import paths changed)… paragraph that closed the error-class section, (2) the ## 6. McpServer API Changes heading, (3) the sentence "The variadic .tool(), .prompt(), .resource() methods are removed. Use the register* methods with a config object.", and (4) the **IMPORTANT**: v2 requires schema objects implementing Standard Schema… paragraph (raw shapes no longer supported, wrap with z.object(), fromJsonSchema guidance). None of this content is re-added anywhere in the new text — the added lines are exclusively about OAuth errors.\n\nHow it manifests in the post-PR file: The top-level numbering jumps from ## 5. Removed / Renamed Type Aliases and Symbols (line 82) directly to ## 7. Headers API (line 294) — there is no section 6 anywhere in the file. The ### Tools, ### Prompts, ### Resources, ### Schema Migration Quick Reference, and ### Removed core exports subsections that previously belonged under "McpServer API Changes" now sit orphaned directly after the OAuth-error subsection, structurally nested under section 5. Grepping the post-PR file for McpServer API Changes, variadic, or Unchanged APIs returns nothing, confirming the content was deleted rather than moved.\n\nStep-by-step proof:\n1. Pre-PR, the file's structure was: ### OAuth error consolidation (table + rewrite example) → **Unchanged APIs** paragraph → ## 6. McpServer API Changes → variadic-removal sentence → IMPORTANT/Standard-Schema paragraph → ### Tools → … → ## 7. Headers API.\n2. The diff hunk @@ -178,52 +178,34 @@ removes everything from the old OAuth table through the IMPORTANT paragraph and inserts only the new OAuth compatibility prose, ending right before ### Tools.\n3. Post-PR, line 527 (## 15. Migration Steps, step 4) still reads "Replace .tool() / .prompt() / .resource() calls with registerTool / registerPrompt / registerResource per section 6" — a dangling reference to a section that no longer exists.\n4. The PR description scopes the change to OAuth error compatibility; nothing in it claims to restructure the McpServer API section, so this is collateral damage from the rewrite, not an intentional doc change.\n\nWhy it matters: This file is the LLM-targeted migration skill (migrate-v1-to-v2). The deleted variadic-method-removal statement and the Standard-Schema/z.object() requirement are core migration directives that an LLM consuming the skill relies on; some of that guidance partially survives in step 5 of section 15 and in the Schema Migration Quick Reference table, but the section heading, the variadic-removal statement, and the Unchanged-APIs paragraph are simply gone, and the broken numbering/orphaned subsections degrade the skill's structure.\n\nHow to fix: Re-insert, between the new OAuth code example and the ### Tools subsection, the previously existing content: the **Unchanged APIs** paragraph, the ## 6. McpServer API Changes heading, the variadic-removal sentence, and the IMPORTANT Standard-Schema paragraph. That restores the section numbering (5 → 6 → 7), re-parents the ### Tools/### Prompts/### Resources subsections, and makes the "per section 6" reference in section 15 valid again.

Comment on lines +336 to +349
/**
* Creates an {@linkcode OAuthError} from an error code, returning the specific subclass for
* known codes so that `instanceof` checks (e.g. `error instanceof InvalidGrantError`) work.
*
* Unknown / non-standard codes produce a plain {@linkcode OAuthError} that preserves the
* raw code.
*/
export function oauthErrorFromCode(code: OAuthErrorCode | string, message: string, errorUri?: string): OAuthError {
const ErrorClass = OAUTH_ERRORS[code];
if (ErrorClass) {
return new ErrorClass(message, errorUri);
}
return new OAuthError(code, message, errorUri);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 OAUTH_ERRORS is a plain object literal, so the OAUTH_ERRORS[code] lookup in oauthErrorFromCode walks the prototype chain — and code comes from untrusted authorization-server responses via OAuthError.fromResponse / parseErrorResponse. A server returning {"error": "constructor"} makes the function return new Object(message) (a String wrapper that is neither an OAuthError nor an Error), while codes like "toString" or "proto" make new ErrorClass(...) throw. Guard the lookup with Object.hasOwn(OAUTH_ERRORS, code), or build the map with a null prototype / a Map.

Extended reasoning...

What the bug is. oauthErrorFromCode (packages/core/src/auth/errors.ts:343-349) does const ErrorClass = OAUTH_ERRORS[code]; if (ErrorClass) return new ErrorClass(message, errorUri);. OAUTH_ERRORS is declared as a plain object literal, so the bracket lookup is not restricted to the 17 keys defined in it — it also resolves inherited properties from Object.prototype (constructor, toString, hasOwnProperty, valueOf, __proto__, …). For those keys ErrorClass is truthy but is not one of the OAuth error subclasses.

The code path that triggers it. The code value is fully server-controlled: OAuthErrorResponseSchema only requires error: z.string(), and parseErrorResponseOAuthError.fromResponseoauthErrorFromCode passes response.error straight through. So any authorization-server error body reaches this lookup unfiltered. executeTokenRequest calls parseErrorResponse both on !response.ok and on the HTTP-200-with-error fallback, and registerClient does the same.

Step-by-step proof (verified empirically in Node):

  1. AS returns 400 with body {"error": "constructor", "error_description": "boom"}.
  2. parseErrorResponse parses it and calls OAuthError.fromResponse({ error: 'constructor', ... }).
  3. oauthErrorFromCode('constructor', 'boom') evaluates OAUTH_ERRORS['constructor'], which resolves via the prototype chain to the global Object constructor — truthy.
  4. new Object('boom') returns [String: 'boom'] — a String wrapper object. instanceof OAuthError is false; instanceof Error is false.
  5. executeTokenRequest then throws this non-Error value to the consumer. instanceof OAuthError checks, the auth() invalid_client/invalid_grant recovery branches, and isTransientOAuthError all silently stop matching, and the declared OAuthError return type of both oauthErrorFromCode and parseErrorResponse is violated.

For codes like 'toString' or '__proto__', the resolved value is not constructible, so new ErrorClass(...) throws TypeError: ... is not a constructor. Inside parseErrorResponse that TypeError happens to be masked by the surrounding try/catch into the generic ServerError fallback (wrong message, original error_description lost), but direct callers of OAuthError.fromResponse / oauthErrorFromCode get the raw TypeError.

Why nothing prevents it today. The schema validation accepts any string, and there is no hasOwnProperty/Object.hasOwn guard before constructing. This is also a (small) regression relative to the pre-PR code path: previously fromResponse always did new OAuthError(response.error, ...), which produced a proper OAuthError regardless of the code string.

Impact. Limited but real: it requires a pathological or hostile authorization server emitting an error code that happens to collide with an Object.prototype property name, and the worst case is a mis-typed thrown value / degraded error classification — no prototype pollution and no code execution (the lookup is read-only). That keeps this at nit/defensive-hardening level rather than blocking, but since it sits on an untrusted-input path and the SDK now constructs from this map for every server error, it's worth closing while the PR is fresh.

How to fix — any one of these one-liners:

  • Guard the lookup: const ErrorClass = Object.hasOwn(OAUTH_ERRORS, code) ? OAUTH_ERRORS[code] : undefined;
  • Give the map a null prototype: Object.assign(Object.create(null), { ... })
  • Use a Map<string, OAuthErrorSubclass> and .get(code).

@felixweinberger felixweinberger marked this pull request as draft June 2, 2026 17:21
The v1->v2 consolidation of the per-code OAuth error subclasses into a
single OAuthError class silently breaks two common consumer patterns:

- instanceof classification (e.g. catching InvalidGrantError to drop
  stored tokens) stops matching with no compile error in loosely typed
  code paths.
- retry classification keyed on instanceof ServerError loses unknown
  error codes: v1 collapsed unrecognized codes into ServerError (hence
  retryable), while v2 preserves the raw code, so a mechanical rewrite
  to code === OAuthErrorCode.ServerError silently drops retries for
  non-standard codes like invalid_refresh_token.

Re-add the subclasses as deprecated wrappers, construct them from
oauthErrorFromCode() so SDK-produced errors satisfy instanceof checks,
restore the deprecated OAUTH_ERRORS map and an errorCode alias, and add
isTransientOAuthError() encoding the v1 retry semantics including the
unknown-code case.

The OAuth InvalidRequestError could not keep its name (taken by the
JSON-RPC protocol type) and returns as OAuthInvalidRequestError.
parseErrorResponse and the client-metadata validation throw sites now
produce the specific deprecated subclass for known error codes, so
instanceof checks written against SDK 1.x keep working for errors
thrown by the SDK itself. Unparsable error bodies fall back to a
ServerError instance, matching 1.x.
Exercise refreshAuthorization with an injected fetchFn returning error
responses, asserting consumer-visible error classes and transient
classification through the full request path rather than only via
parseErrorResponse directly.
…d OAUTH_ERRORS lookup; restore deleted McpServer section in migration-SKILL.md
@felixweinberger felixweinberger force-pushed the fweinberger/oauth-error-compat branch from e3c3afd to 3ea50ff Compare June 2, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant