Skip to content

feat: Improved metrics in ART#609

Open
vivekkalyan wants to merge 47 commits intomainfrom
feat/improved-metrics
Open

feat: Improved metrics in ART#609
vivekkalyan wants to merge 47 commits intomainfrom
feat/improved-metrics

Conversation

@vivekkalyan
Copy link
Collaborator

@vivekkalyan vivekkalyan commented Mar 10, 2026

Summary

This PR implements the W&B metrics taxonomy RFC in ART so run metrics land in stable top-level namespaces that are easier to panel and compare in W&B.

The main behavior change is hierarchical cost logging. Users can log leaf metrics like costs/train/llm_judge/correctness or costs/train/tinker_train, and ART will automatically emit parent rollups like costs/train and costs/all, plus cumulative costs/cum/* metrics across steps.

This PR also adds a MetricsBuilder API and a @track_api_cost(...) decorator so judge / external API spend can be logged into the same taxonomy with model-aware pricing.

What changed

  • Canonicalized ART/W&B metric sections to reward/*, loss/*, throughput/*, costs/*, time/*, data/*, train/*, val/*, and test/*.
  • Routed backend-emitted metrics into those namespaces across local, serverless, unsloth, and tinker paths.
  • Added MetricsBuilder for user-owned metrics, hierarchical cost rollups, costs/cum/*, cumulative time/* and data/*, derived throughput metrics, exact data/cum/num_unique_scenarios, and persisted resume state.
  • Added automatic ART-owned metrics where available, including time/step_wall_s, time/step_trainer_s, time/step_actor_s, time/step_eval_s, data/step_*, train/*, and automatic local costs/gpu when pricing is known.
  • Added @track_api_cost(...) for OpenAI/Anthropic responses with explicit provider and model_name, cache-aware pricing, and hooks for custom extractors / registered pricing.
  • Updated W&B metric definitions and docs for the new taxonomy.
  • Removed support for legacy costs_* keys in favor of hierarchical costs/... paths.

Notes

  • track_api_cost(...) requires explicit provider and model_name and raises if pricing is missing
  • ART still uses training_step as the x-axis;
  • But, we can now change the x-axis to metrics other than training_step (e.g. costs)
image

@vivekkalyan vivekkalyan marked this pull request as ready for review March 10, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant