Add metric versionning#113
Merged
Merged
Conversation
Add an optional `version` field to MetricScore and wire turn_taking to populate it from a `version = "v0.1"` class variable at every output site (main score, missed-turn early return, sub-metrics). This lets us tell, across partial metric reruns, which computation logic produced a given row — bump the class var when the algorithm changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generalize the turn_taking-only version stamp to all metrics, and add an automatic per-judge prompt_hash so prompt edits are detectable even without a manual version bump. Stamping happens centrally via a Pydantic model_validator that reads two contextvars set by MetricsRunner before each metric.compute() call — metric authors only declare `version = "v0.1"` on the class and the rest is automatic at every MetricScore call site (no per-site `version=self.version` plumbing). The contextvar approach is per-asyncio-task, so concurrent metrics in the same record don't bleed values into each other. On partial reruns, metrics that aren't recomputed keep whatever version/prompt_hash was on disk — the validator only fills when the field is unset, so deserialized historical rows are preserved. prompt_hash is the sha256[:12] of the *unrendered* template (so per- record variable substitutions don't change the hash). PromptManager gains `get_template(path)` to expose the raw YAML template; BaseMetric. get_judge_prompt() pushes the hash into the contextvar each call. Drift test (tests/unit/metrics/test_metric_signatures.py) compares each concrete metric class's (version, source_hash, prompt_hash) against tests/fixtures/metric_signatures.json. Authors run `python scripts/regen_metric_signatures.py` to refresh the fixture after a deliberate version bump or prompt edit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JosephMarinier
approved these changes
May 13, 2026
Collaborator
JosephMarinier
left a comment
There was a problem hiding this comment.
Cool! Thanks for taking care of this!
The implementation seems a bit complex, but I don't expect this code to change often, so I think it's fine.
Collaborator
Author
I agree, it's somewhat ugly. Thanks for your cleanups!! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This will help when rerunning to track if all records have the latest version.