Skip to content

Fix collectives compilation error on multi-host TPU allocation#126

Open
simrankaurb wants to merge 1 commit into
AI-Hypercomputer:chsfrom
simrankaurb:single-host-compile-fix
Open

Fix collectives compilation error on multi-host TPU allocation#126
simrankaurb wants to merge 1 commit into
AI-Hypercomputer:chsfrom
simrankaurb:single-host-compile-fix

Conversation

@simrankaurb

Copy link
Copy Markdown

This PR fixes the XLA compilation failure during single-node collective benchmark execution on a multi-host allocation. It forces JAX to compile collectives with logical device IDs by temporarily mocking the process count to 1 during AOT compilation.

@simrankaurb simrankaurb force-pushed the single-host-compile-fix branch 9 times, most recently from 06ef478 to cfacaf2 Compare June 9, 2026 17:54
@simrankaurb simrankaurb force-pushed the single-host-compile-fix branch from cfacaf2 to ad784d5 Compare June 9, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant