[fix] Resolve issues from null data in live evaluations#4462
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR refactors the live evaluation revision resolution to extract validation logic into reusable async helpers. New functions ChangesLive Evaluation Revision Resolution
🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #3138 by preventing live evaluations from running when referenced query/evaluator revision records exist but have data=None, which could otherwise lead to unfiltered trace queries or non-functional evaluator workflow invocations.
Changes:
- Added revision-resolution guard helpers (
_resolve_query_revisions,_resolve_evaluator_revisions) that skip revisions missing required identity ordata. - Updated
evaluate_live_queryto use the new guard helpers (removing the previous behavior that fabricated emptydataobjects). - Added unit tests covering the “skip null data” and “keep non-null data” behaviors for both query and evaluator revisions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
api/oss/src/core/evaluations/tasks/live.py |
Adds guard helpers and switches live evaluation to skip revisions with data=None rather than fabricating empty configs. |
api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py |
Introduces unit tests validating the new guard behavior for query and evaluator revisions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( | ||
| not query_revision | ||
| or not query_revision.id | ||
| or not query_revision.slug | ||
| or not query_revision.data | ||
| ): | ||
| log.warn( | ||
| f"Query revision with ref {query_revision_ref.model_dump(mode='json')} not found!" | ||
| ) | ||
| continue |
| if ( | ||
| not evaluator_revision | ||
| or not evaluator_revision.id | ||
| or not evaluator_revision.slug | ||
| or not evaluator_revision.data | ||
| ): | ||
| log.warn( | ||
| f"Evaluator revision with ref {evaluator_revision_ref.model_dump(mode='json')} not found!" | ||
| ) | ||
| continue |
| revision_with_data = QueryRevision( | ||
| id=revision_id, | ||
| slug="my-query", | ||
| data=QueryRevisionData(), | ||
| ) |
| revision_with_data = EvaluatorRevision( | ||
| id=revision_id, | ||
| slug="my-evaluator", | ||
| data=EvaluatorRevisionData(), | ||
| ) |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/oss/src/core/evaluations/tasks/live.py (1)
567-578:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winKeyError when accessing references for skipped evaluator revisions.
The loop iterates over all
annotation_steps_keys, butevaluator_referencesonly contains keys for successfully resolved revisions (populated at lines 360-363). If a revision was skipped due to null data, accessingevaluator_references[annotation_step_key]at line 568 will raise aKeyErrorbefore the guard at line 580 can protect against it.🐛 Proposed fix: check for key presence or use `.get()` with guard
# run evaluator revisions -------------------------------------- for jdx in range(nof_annotations): annotation_step_key = annotation_steps_keys[jdx] annotation_step = annotation_steps[annotation_step_key] if annotation_step.origin in {"human", "custom"}: scenario_has_pending[idx] = True continue + # Skip if evaluator revision was not resolved (e.g., null data) + if annotation_step_key not in evaluator_revisions: + log.warn( + f"Skipping annotation step {annotation_step_key}: evaluator revision not resolved" + ) + scenario_has_errors[idx] += 1 + scenario_status[idx] = EvaluationStatus.ERRORS + continue + step_status = EvaluationStatus.SUCCESS references: Dict[str, Any] = { **evaluator_references[annotation_step_key], }
🧹 Nitpick comments (1)
api/oss/src/core/evaluations/tasks/live.py (1)
578-589: 💤 Low valueUnreachable guard:
evaluator_revisionfalsiness check is now dead code.Since
evaluator_revisionsis now populated only with successfully resolved revisions, accessingevaluator_revisions[annotation_step_key]will either return a valid revision or raiseKeyError. Theif not evaluator_revisioncheck at line 580 can never be true (assuming the KeyError issue above is fixed with a prior guard).Consider removing this dead code block once the KeyError fix is in place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5d5da984-8910-47e4-a251-4fd58af504c8
📒 Files selected for processing (2)
api/oss/src/core/evaluations/tasks/live.pyapi/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py
Resolves this issue.