Skip to content

fix(devserver): warn instead of silently dropping duplicate eval_name (#366)#477

Open
srijanarya wants to merge 1 commit into
braintrustdata:mainfrom
srijanarya:fix/devserver-duplicate-eval-name
Open

fix(devserver): warn instead of silently dropping duplicate eval_name (#366)#477
srijanarya wants to merge 1 commit into
braintrustdata:mainfrom
srijanarya:fix/devserver-duplicate-eval-name

Conversation

@srijanarya
Copy link
Copy Markdown

What

create_app built the evaluator registry with a dict comprehension:

_all_evaluators = {evaluator.eval_name: evaluator for evaluator in evaluators}

When two Eval(...) share an eval_name, the later one silently overwrites the earlier. The dev server starts fine, but GET /list returns only one of them — and the startup log (Loaded N evaluator(s)) counts the pre-dedup list, so it disagrees with what's actually served. An entire eval is dropped with no signal. This is the behavior reported in #366.

Change

Build the registry explicitly instead:

  • Warn, don't drop — emit a UserWarning naming the duplicated eval_name, so the conflict is visible at load time.
  • First wins — keep the first registration and skip the duplicate, mirroring how duplicate reporters are handled in cli/eval.py.
  • Honest count — report len(_all_evaluators) after create_app runs, so the startup log matches exactly what /list serves.

Test

Adds test_create_app_warns_and_keeps_first_on_duplicate_eval_name: asserts the UserWarning fires, the first evaluator is the one kept, and exactly one entry remains. Skips cleanly when the devserver extras aren't installed.

1 passed, 9 deselected

Closes #366.

…braintrustdata#366)

When two Eval(...) share an eval_name, create_app's dict comprehension silently kept only the last one, so GET /list and the startup count disagreed and a user could lose an entire eval with no signal.

Build the registry explicitly: warn on a duplicate eval_name, keep the first registration (mirroring duplicate-reporter handling in cli/eval.py), and report the post-dedup count so the startup log matches what /list serves.

Adds a regression test asserting warn + first-wins + single-entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

braintrust eval --dev silently drops evaluators with duplicate eval_name

1 participant