Add metric versionning by gabegma · Pull Request #113 · ServiceNow/eva

gabegma · 2026-05-12T17:49:23Z

This will help when rerunning to track if all records have the latest version.

Add an optional `version` field to MetricScore and wire turn_taking to populate it from a `version = "v0.1"` class variable at every output site (main score, missed-turn early return, sub-metrics). This lets us tell, across partial metric reruns, which computation logic produced a given row — bump the class var when the algorithm changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Generalize the turn_taking-only version stamp to all metrics, and add an automatic per-judge prompt_hash so prompt edits are detectable even without a manual version bump. Stamping happens centrally via a Pydantic model_validator that reads two contextvars set by MetricsRunner before each metric.compute() call — metric authors only declare `version = "v0.1"` on the class and the rest is automatic at every MetricScore call site (no per-site `version=self.version` plumbing). The contextvar approach is per-asyncio-task, so concurrent metrics in the same record don't bleed values into each other. On partial reruns, metrics that aren't recomputed keep whatever version/prompt_hash was on disk — the validator only fills when the field is unset, so deserialized historical rows are preserved. prompt_hash is the sha256[:12] of the *unrendered* template (so per- record variable substitutions don't change the hash). PromptManager gains `get_template(path)` to expose the raw YAML template; BaseMetric. get_judge_prompt() pushes the hash into the contextvar each call. Drift test (tests/unit/metrics/test_metric_signatures.py) compares each concrete metric class's (version, source_hash, prompt_hash) against tests/fixtures/metric_signatures.json. Authors run `python scripts/regen_metric_signatures.py` to refresh the fixture after a deliberate version bump or prompt edit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JosephMarinier

Cool! Thanks for taking care of this!

The implementation seems a bit complex, but I don't expect this code to change often, so I think it's fine.

gabegma · 2026-05-13T18:50:05Z

The implementation seems a bit complex, but I don't expect this code to change often, so I think it's fine.

I agree, it's somewhat ugly. Thanks for your cleanups!!

gabegma and others added 2 commits May 12, 2026 11:17

Base automatically changed from ggm/bug-fixes-following-paper to main May 13, 2026 17:22

JosephMarinier reviewed May 13, 2026

View reviewed changes

Comment thread src/eva/utils/prompt_manager.py

JosephMarinier reviewed May 13, 2026

View reviewed changes

Comment thread src/eva/models/versioning.py Outdated

JosephMarinier reviewed May 13, 2026

View reviewed changes

Comment thread tests/unit/metrics/test_metric_signatures.py

JosephMarinier approved these changes May 13, 2026

View reviewed changes

JosephMarinier and others added 2 commits May 13, 2026 14:18

Remove duplicated code in get_prompt()

6d66128

Remove unnecessary default value

bd30a7c

JosephMarinier reviewed May 13, 2026

View reviewed changes

Comment thread src/eva/metrics/versioning.py

JosephMarinier added 4 commits May 13, 2026 15:28

Move src/eva/models/versioning.py to src/eva/metrics/

1b3602d

Regenerate metric signatures in pre-commit

0cb0f5e

Avoid circular import

c692de9

Specify uv run in pre-commit

1d8a8d7

gabegma added this pull request to the merge queue May 13, 2026

Merged via the queue into main with commit 598f7c1 May 13, 2026
1 check passed

gabegma deleted the ggm/add-metric-versionning branch May 13, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric versionning#113

Add metric versionning#113
gabegma merged 8 commits into
mainfrom
ggm/add-metric-versionning

gabegma commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JosephMarinier left a comment

Uh oh!

gabegma commented May 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gabegma commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JosephMarinier left a comment

Choose a reason for hiding this comment

Uh oh!

gabegma commented May 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants