Skip to content

Fix wrong parent when nesting perf metrics#1066

Merged
copybara-service[bot] merged 1 commit intomainfrom
test_867702455
Feb 11, 2026
Merged

Fix wrong parent when nesting perf metrics#1066
copybara-service[bot] merged 1 commit intomainfrom
test_867702455

Conversation

@copybara-service
Copy link
Copy Markdown

@copybara-service copybara-service Bot commented Feb 9, 2026

Fix wrong parent when nesting perf metrics

The refer_inference is currenlty nested under actor_training. Even though it is logically triggered before actor_training, its completion callback might run after actor_training has started. If actor_training is an active group on the same device timeline, the tracing infrastructure would incorrectly nest refer_inference under actor_training. Fixing the parent group to be at initiation time.

verified by manual launch.
https://screenshot.googleplex.com/5QmxLA7ZeTt8axQ

I0209 20:30:24.072806 131196942059584 export.py:419] Timeline: tpu0
I0209 20:30:24.072829 131196942059584 export.py:420] - root (0.000000, inf)
  - global_step (1.362577, 122.438728)
    - mini_batch_step (1.362613, 122.438719)
      - micro_batch_steps (1.362624, 122.438711)
        - rollout (1.392425, 35.577381)
        - actor_training (50.802475, 122.438482)
          - peft_train_step (122.164713, 122.437401)
        - refer_inference (35.659063, 50.838434)

PiperOrigin-RevId: 868486420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant