Skip to content

Grant required IAM roles to Compute Engine default SA when --managed-mldiagnostics is passed during xpk cluster create#1187

Merged
scaliby merged 1 commit into
AI-Hypercomputer:mainfrom
rapatchi:permission_fix
Jun 11, 2026
Merged

Grant required IAM roles to Compute Engine default SA when --managed-mldiagnostics is passed during xpk cluster create#1187
scaliby merged 1 commit into
AI-Hypercomputer:mainfrom
rapatchi:permission_fix

Conversation

@rapatchi

@rapatchi rapatchi commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

When provisioning clusters with --managed-mldiagnostics, XLA ML diagnostics requires roles/hypercomputecluster.editor, roles/storage.objectUser, and roles/logging.logWriter to be bound to the Compute Engine default service account.

This commit:

  1. Automatically resolves projectNumber and grants these 3 required IAM roles via gcloud projects add-iam-policy-binding during cluster create when --managed-mldiagnostics is enabled.
  2. Updates user documentation (permissions.md, clusters.md) and unit test coverage accordingly.

Issue

If not done permissions need to be given manually for mldiagonstics to work.

Testing

Have you performed any manual testing on your change?

Prior IAM Bindings:
image

Cluster Creation Logs:

(xpk_local_venv) rapatchi@rapatchi2:~/xpk_fork/xpk_sa$ xpk cluster create --cluster=maxtest-cluster1 --tpu-type=v5litepod-8 --project=rapatchiconsumer --zone=us-central1-b --num-nodes=2 --spot --managed-mldiagnostics
[XPK] Starting xpk v0.1.dev903+g2b0dc6334
...
[XPK] Task: `Get Project Number` is implemented by `gcloud projects describe rapatchiconsumer --format="value(projectNumber)"`
[XPK] Granting necessary roles to 641919595434-compute@developer.gserviceaccount.com
[XPK] Task: `Grant roles/hypercomputecluster.editor` is implemented by `gcloud projects add-iam-policy-binding rapatchiconsumer --member="serviceAccount:641919595434-compute@developer.gserviceaccount.com" --role="roles/hypercomputecluster.editor" --condition=None`
[XPK] Task: `Grant roles/hypercomputecluster.editor` succeeded.
[XPK] Task: `Grant roles/storage.objectUser` is implemented by `gcloud projects add-iam-policy-binding rapatchiconsumer --member="serviceAccount:641919595434-compute@developer.gserviceaccount.com" --role="roles/storage.objectUser" --condition=None`
[XPK] Task: `Grant roles/storage.objectUser` succeeded.
[XPK] Task: `Grant roles/logging.logWriter` is implemented by `gcloud projects add-iam-policy-binding rapatchiconsumer --member="serviceAccount:641919595434-compute@developer.gserviceaccount.com" --role="roles/logging.logWriter" --condition=None`
[XPK] Task: `Grant roles/logging.logWriter` succeeded.
[XPK] Task: `Determine server supported GKE versions for default gke version` is implemented by `gcloud container get-server-config --project=rapatchiconsumer --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)"`
...

Post Creation:

image

Have you verified use cases affected by goldens? Yes

Comment thread src/xpk/commands/cluster.py
Comment thread src/xpk/commands/cluster.py Outdated
Comment thread src/xpk/commands/managed_ml_diagnostics.py Outdated
Comment thread src/xpk/commands/managed_ml_diagnostics.py Outdated

@scaliby scaliby left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my feedback. LGTM!

@scaliby scaliby enabled auto-merge June 11, 2026 09:18
@scaliby scaliby disabled auto-merge June 11, 2026 09:18
@scaliby

scaliby commented Jun 11, 2026

Copy link
Copy Markdown
Member

Please address linter failures and resolve conflicts

…agnostics

When provisioning clusters with --managed-mldiagnostics, XLA ML diagnostics
requires roles/hypercomputecluster.editor, roles/storage.objectUser, and
roles/logging.logWriter to be bound to the Compute Engine default service account.

This commit:
1. Automatically resolves projectNumber and grants these 3 IAM roles via
   gcloud projects add-iam-policy-binding during cluster create when
   --managed-mldiagnostics is enabled.
2. Explicitly specifies --condition=None to ensure non-interactive command
   compatibility when existing IAM policies contain conditional bindings.
3. Updates user documentation (permissions.md, clusters.md) and unit
   test coverage accordingly.
@rapatchi rapatchi reopened this Jun 11, 2026
@rapatchi

Copy link
Copy Markdown
Contributor Author

Addressed the linter fix and merge conflict. Will wait for presubmits to pass.

@scaliby scaliby added this pull request to the merge queue Jun 11, 2026
Merged via the queue into AI-Hypercomputer:main with commit 0eb2457 Jun 11, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants