fix(devserver): warn instead of silently dropping duplicate eval_name (#366)#477
Open
srijanarya wants to merge 1 commit into
Open
fix(devserver): warn instead of silently dropping duplicate eval_name (#366)#477srijanarya wants to merge 1 commit into
srijanarya wants to merge 1 commit into
Conversation
…braintrustdata#366) When two Eval(...) share an eval_name, create_app's dict comprehension silently kept only the last one, so GET /list and the startup count disagreed and a user could lose an entire eval with no signal. Build the registry explicitly: warn on a duplicate eval_name, keep the first registration (mirroring duplicate-reporter handling in cli/eval.py), and report the post-dedup count so the startup log matches what /list serves. Adds a regression test asserting warn + first-wins + single-entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
create_appbuilt the evaluator registry with a dict comprehension:When two
Eval(...)share aneval_name, the later one silently overwrites the earlier. The dev server starts fine, butGET /listreturns only one of them — and the startup log (Loaded N evaluator(s)) counts the pre-dedup list, so it disagrees with what's actually served. An entire eval is dropped with no signal. This is the behavior reported in #366.Change
Build the registry explicitly instead:
UserWarningnaming the duplicatedeval_name, so the conflict is visible at load time.cli/eval.py.len(_all_evaluators)aftercreate_appruns, so the startup log matches exactly what/listserves.Test
Adds
test_create_app_warns_and_keeps_first_on_duplicate_eval_name: asserts theUserWarningfires, the first evaluator is the one kept, and exactly one entry remains. Skips cleanly when the devserver extras aren't installed.Closes #366.