refactor Flux transformer to use scanned blocks, dynamic checkpointing, and decoupled projections by prishajain1 · Pull Request #417 · AI-Hypercomputer/maxdiffusion

prishajain1 · 2026-06-12T06:20:03Z

Overview

This PR refactors the Flux model architecture in MaxDiffusion to support scanned blocks (nn.scan) for double and single blocks, implements configurable gradient checkpointing (rematerialization) policies, and updates the weights loader to support loading pretrained checkpoints under the scanned format.

Key Changes

Decoupled Fused Projections: Decoupled the projection layers (implementing the MlpAndOutputBlock wrapper) to eliminate redundant recomputation of attention and projection outputs.
QKV Slicing Refactoring: Refactored the QKV projection slicing logic to use jnp.split across Flux transformer blocks for cleaner layout constraints.
Scanned Block Architecture: Migrated Flux Double and Single Transformer Blocks to use nn.scan to optimize compiler tracing and step execution speed on TPUs.
Dynamic Gradient Checkpointing: Added FLUX_OPTIMIZED to GradientCheckpointType to allow configuring block-specific rematerialization policies dynamically via configuration files instead of being hardcoded.
Stacked Weights Loading: Updated the weights loader (util.py) to slice, group, and stack PyTorch checkpoint weights along axis 0 to match the expected format of nn.scan layers.

github-actions · 2026-06-12T06:20:12Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

…ng, and weight loading improvements

github-actions · 2026-06-12T06:32:23Z

🤖 Hi @prishajain1, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-12T06:35:20Z

🤖 I'm sorry @prishajain1, but I was unable to process your request. Please see the logs for more details.

Perseus14 · 2026-06-12T16:11:02Z

+          names_which_can_be_saved=self.config.names_which_can_be_saved,
+          names_which_can_be_offloaded=self.config.names_which_can_be_offloaded,


Can we make these variable names a little more explicit? Maybe something like saved_transformer_layer_names or savable_transformer_layer_names and offloaded_transformer_layer_names or offloadable_transformer_layer_names? I will leave it upto you

Perseus14 · 2026-06-12T16:15:45Z

+            names_which_can_be_saved=self.config.names_which_can_be_saved,
+            names_which_can_be_offloaded=self.config.names_which_can_be_offloaded,
        )


Same as above

Perseus14 · 2026-06-12T16:16:55Z

+names_which_can_be_saved: []
+names_which_can_be_offloaded: []


Same as above

Perseus14 · 2026-06-12T16:18:13Z

Can we rename the file to transformer_flux.py?

Perseus14 · 2026-06-12T16:19:43Z

-  r"""
-  A Transformer block following the MMDiT architecture, introduced in Stable Diffusion 3.
-
-  Reference: https://arxiv.org/abs/2403.03206
-
-  Parameters:
-      dim (`int`): The number of channels in the input and output.
-      num_attention_heads (`int`): The number of heads to use for multi-head attention.
-      attention_head_dim (`int`): The number of channels in each head.
-      context_pre_only (`bool`): Boolean to determine if we should add some blocks associated with the
-          processing of `context` conditions.
-  """
-


Let's keep the comment block. Update if required

Perseus14 · 2026-06-12T16:23:02Z

    vec_shape = (
        batch_size,
-        768,  # Sequence length of clip, how to get this programmatically?
+        768,


Same as above

Perseus14 · 2026-06-12T16:24:07Z

-  r"""
-  Norm layer adaptive layer norm zero (adaLN-Zero).
-
-  Parameters:
-      embedding_dim (`int`): The size of each embedding vector.
-      num_embeddings (`int`): The size of the embeddings dictionary.
-  """
-


Let's keep the comment block

Perseus14 · 2026-06-12T16:24:44Z

-  r"""
-  Norm layer adaptive layer norm zero (adaLN-Zero).
-
-  Parameters:
-      embedding_dim (`int`): The size of each embedding vector.
-      num_embeddings (`int`): The size of the embeddings dictionary.
-  """
-


Let's keep the comment block

Perseus14 · 2026-06-12T16:25:24Z

+
+      x = (x - mean) * inv_std * (1.0 + scale_msa) + shift_msa
    else:
-      raise ValueError(f"Unsupported `norm_type` ({self.norm_type}) provided. Supported ones are: 'layer_norm'.")


Is there a need to change this text comment?

Perseus14 · 2026-06-12T16:26:12Z

+        names_which_can_be_saved=config.names_which_can_be_saved,
+        names_which_can_be_offloaded=config.names_which_can_be_offloaded,


Please change variable names

prishajain1 requested a review from entrpn as a code owner June 12, 2026 06:20

prishajain1 marked this pull request as draft June 12, 2026 06:20

prishajain1 force-pushed the prisha/flux_training branch from 6dfd5ea to 4696256 Compare June 12, 2026 06:22

Flux training: Implement scanned blocks, dynamic gradient checkpointi…

11ddfef

…ng, and weight loading improvements

prishajain1 force-pushed the prisha/flux_training branch from 4696256 to 11ddfef Compare June 12, 2026 06:29

prishajain1 marked this pull request as ready for review June 12, 2026 06:31

prishajain1 added the gemini-review label Jun 12, 2026

prishajain1 removed the gemini-review label Jun 12, 2026

prishajain1 requested a review from Perseus14 June 12, 2026 08:59

Perseus14 requested changes Jun 12, 2026

View reviewed changes

		names_which_can_be_saved=self.config.names_which_can_be_saved,
		names_which_can_be_offloaded=self.config.names_which_can_be_offloaded,

		names_which_can_be_saved: []
		names_which_can_be_offloaded: []

		names_which_can_be_saved=config.names_which_can_be_saved,
		names_which_can_be_offloaded=config.names_which_can_be_offloaded,

Conversation

prishajain1 commented Jun 12, 2026

Overview

Key Changes

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants