Skip to content

HDDS-15273. Add OIDC WebIdentity STS design#10338

Open
paf91 wants to merge 1 commit into
apache:masterfrom
paf91:HDDS-15273-webidentity-design
Open

HDDS-15273. Add OIDC WebIdentity STS design#10338
paf91 wants to merge 1 commit into
apache:masterfrom
paf91:HDDS-15273-webidentity-design

Conversation

@paf91
Copy link
Copy Markdown

@paf91 paf91 commented May 23, 2026

What changes were proposed in this pull request?

This PR adds the design document for OIDC/WebIdentity support in Apache Ozone STS.

The design describes how Ozone STS can support an AssumeRoleWithWebIdentity flow, allowing an OIDC token issued by an external identity provider such as Keycloak to be exchanged for temporary S3
credentials.
This is a design-document-only PR. It does not introduce runtime code changes.

The implementation remains in PR #10266:

The design covers:

  • Keycloak/OIDC as the identity provider.
  • OM-authoritative JWT validation.
  • Ozone STS issuing temporary S3 credentials.
  • Normal AWS SigV4 requests with x-amz-security-token for subsequent S3 access.
  • Ranger or the configured Ozone authorizer as the authorization / policy decision point.
  • The boundary between authentication and authorization.
  • Why Keycloak roles/groups are identity attributes and not final bucket/object authorization decisions.
  • Ratis / raw JWT persistence considerations.
  • Backward compatibility with existing STS AssumeRole.
  • Security properties and non-goals.

This design does not propose replacing Kerberos daemon authentication, does not add OFS OIDC login, does not add CLI device-code login, and does not make Keycloak Authorization Services the Ozone policy
engine.

This design PR is split from the implementation PR so the design can be reviewed independently and documentation edits do not require rerunning the full implementation CI.

The operator/runtime Keycloak/Ranger guide remains in the implementation PR for now because it is tied to implementation config and runtime behavior.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15273

How was this patch tested?

This is a design-document-only PR.

The patch was checked with:

git diff --check upstream/master..HEAD

Result:

clean

@adoroszlai
Copy link
Copy Markdown
Contributor

@ChenSammi @errose28 @fmorg-git @Tejaskriya please take a look

title: OIDC AssumeRoleWithWebIdentity for Ozone STS
summary: Web identity support for Ozone STS using OIDC and Ranger authorization
date: 2026-05-13
status: proposed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the Jira and author fields, similar to https://github.com/apache/ozone/pull/9223/changes. The Jira should be different from HDDS-13323 to not intersperse the implementations.

credentials into `SignatureInfo.sessionToken`.
- `EndpointBase` and `S3STSEndpointBase` propagate the session token into
`S3Auth`.
- `OzoneManagerProtocolClientSideTranslatorPB` copies `S3Auth` into
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this statement was listed above previously

not call Keycloak, refresh JWKS, revalidate JWTs, or otherwise depend on current
external IdP state during Ratis apply or replay. Credential expiration is
computed by the leader before replication and stored as
`credentialExpirationEpochSeconds` so replay does not depend on the apply-time
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why a new field credentialExpirationEpochSeconds instead of the existing expirationEpochSeconds?


- only for the STS application path;
- only for `Action=AssumeRoleWithWebIdentity`;
- only when `ozone.sts.web.identity.enabled=true`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current sts configuration flag is ozone.s3g.sts.http.enabled - should the new one be ozone.s3g.sts.web.identity.enabled?

- `RoleSessionName=<session>`
- `WebIdentityToken=<OIDC JWT>`
- `DurationSeconds=<optional>`
- `Policy=<optional, only if the existing STS AssumeRole session policy path is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current AssumeRole flow supports session policies and converts the IAM resources to Ozone objects, permissions and actions for Ranger to consume. The Ranger authorizer defines the embedded session policy format in the STS token - it is opaque to Ozone.


The common request-shape extension point is
`AssumeRoleWithWebIdentityRequest`, with
`IAccessAuthorizer.generateAssumeRoleWithWebIdentitySessionPolicy()` as the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why the current IAccessAuthorizer.generateAssumeRoleSessionPolicy() method can't be used once the JWT token is validated and a valid Ozone Kerberos user is identified?


- `authType=ASSUME_ROLE` for existing tokens, with `originalAccessKeyId`
preserved;
- `authType=WEB_IDENTITY` for new tokens, with effective user, groups, issuer,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would the effective user be in the case of WEB_IDENTITY?

Errors should use STS/S3-compatible codes where possible:

- invalid or expired JWT: `InvalidIdentityToken`;
- disabled feature: `AccessDenied` or `InvalidAction`;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe FEATURE_NOT_ENABLED is used for disabled feature

- disabled feature: `AccessDenied` or `InvalidAction`;
- unauthorized role assumption: `AccessDenied`;
- unsupported optional parameter: `InvalidParameterValue`;
- internal validation or revocation failures: fail closed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What response code would occur if the JWKS server is down/unresponsive?

- fake/Ranger authorizer allows `tomato-user` to assume a test role and denies
`denied-user`.

Full Ranger container testing is optional for the MVP. Unit and mock-layer tests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to have full ranger container testing. For example, for the tomato-user example, I think it won't work if that user isn't also in Ranger correct?

@adoroszlai adoroszlai requested a review from fapifta May 28, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants