Skip to content

[fix] Resolve issues from null data in live evaluations#4462

Open
junaway wants to merge 1 commit into
mainfrom
fix/wild-null-data-in-live-evaluations
Open

[fix] Resolve issues from null data in live evaluations#4462
junaway wants to merge 1 commit into
mainfrom
fix/wild-null-data-in-live-evaluations

Conversation

@junaway
Copy link
Copy Markdown
Contributor

@junaway junaway commented May 27, 2026

Resolves this issue.

Copilot AI review requested due to automatic review settings May 27, 2026 13:47
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 27, 2026 1:48pm

Request Review

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 27, 2026
@dosubot dosubot Bot added bug Something isn't working Evaluation labels May 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved validation of query and evaluator revisions to ensure required data is present before processing.
    • Enhanced handling of missing revision data with warning logs; revisions without data are now gracefully omitted from processing.
  • Tests

    • Added comprehensive test coverage for revision resolution and data validation logic.

Walkthrough

This PR refactors the live evaluation revision resolution to extract validation logic into reusable async helpers. New functions _resolve_query_revisions and _resolve_evaluator_revisions validate fetched revisions and skip those with missing data, replacing inline loops and default-initialization behavior in evaluate_live_query.

Changes

Live Evaluation Revision Resolution

Layer / File(s) Summary
Revision resolution helpers and imports
api/oss/src/core/evaluations/tasks/live.py
Removes QueryRevisionData import and introduces _resolve_query_revisions and _resolve_evaluator_revisions async helpers that fetch revisions via services, validate required id, slug, and data fields, log warnings for missing data, and return only successfully resolved revisions.
Integration into evaluate_live_query
api/oss/src/core/evaluations/tasks/live.py
Updates evaluate_live_query to call the new resolution helpers instead of inline fetch loops, populates reference maps only for successfully resolved step keys, and removes default DTO instantiation for missing data.
Resolution helper test coverage
api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py
Adds tests for both query and evaluator resolution helpers verifying that revisions with null data are skipped and revisions with populated data are returned in the resolved map.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing issues caused by null data in live evaluations, which matches the refactoring work to handle missing data in revision resolution.
Description check ✅ Passed The description references the related GitHub issue, which is appropriately minimal but meaningfully connects the PR to a specific problem being resolved.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/wild-null-data-in-live-evaluations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #3138 by preventing live evaluations from running when referenced query/evaluator revision records exist but have data=None, which could otherwise lead to unfiltered trace queries or non-functional evaluator workflow invocations.

Changes:

  • Added revision-resolution guard helpers (_resolve_query_revisions, _resolve_evaluator_revisions) that skip revisions missing required identity or data.
  • Updated evaluate_live_query to use the new guard helpers (removing the previous behavior that fabricated empty data objects).
  • Added unit tests covering the “skip null data” and “keep non-null data” behaviors for both query and evaluator revisions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
api/oss/src/core/evaluations/tasks/live.py Adds guard helpers and switches live evaluation to skip revisions with data=None rather than fabricating empty configs.
api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py Introduces unit tests validating the new guard behavior for query and evaluator revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +185 to +194
if (
not query_revision
or not query_revision.id
or not query_revision.slug
or not query_revision.data
):
log.warn(
f"Query revision with ref {query_revision_ref.model_dump(mode='json')} not found!"
)
continue
Comment on lines +217 to +226
if (
not evaluator_revision
or not evaluator_revision.id
or not evaluator_revision.slug
or not evaluator_revision.data
):
log.warn(
f"Evaluator revision with ref {evaluator_revision_ref.model_dump(mode='json')} not found!"
)
continue
Comment on lines +52 to +56
revision_with_data = QueryRevision(
id=revision_id,
slug="my-query",
data=QueryRevisionData(),
)
Comment on lines +112 to +116
revision_with_data = EvaluatorRevision(
id=revision_id,
slug="my-evaluator",
data=EvaluatorRevisionData(),
)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/oss/src/core/evaluations/tasks/live.py (1)

567-578: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

KeyError when accessing references for skipped evaluator revisions.

The loop iterates over all annotation_steps_keys, but evaluator_references only contains keys for successfully resolved revisions (populated at lines 360-363). If a revision was skipped due to null data, accessing evaluator_references[annotation_step_key] at line 568 will raise a KeyError before the guard at line 580 can protect against it.

🐛 Proposed fix: check for key presence or use `.get()` with guard
                 # run evaluator revisions --------------------------------------
                 for jdx in range(nof_annotations):
                     annotation_step_key = annotation_steps_keys[jdx]
                     annotation_step = annotation_steps[annotation_step_key]

                     if annotation_step.origin in {"human", "custom"}:
                         scenario_has_pending[idx] = True
                         continue

+                    # Skip if evaluator revision was not resolved (e.g., null data)
+                    if annotation_step_key not in evaluator_revisions:
+                        log.warn(
+                            f"Skipping annotation step {annotation_step_key}: evaluator revision not resolved"
+                        )
+                        scenario_has_errors[idx] += 1
+                        scenario_status[idx] = EvaluationStatus.ERRORS
+                        continue
+
                     step_status = EvaluationStatus.SUCCESS

                     references: Dict[str, Any] = {
                         **evaluator_references[annotation_step_key],
                     }
🧹 Nitpick comments (1)
api/oss/src/core/evaluations/tasks/live.py (1)

578-589: 💤 Low value

Unreachable guard: evaluator_revision falsiness check is now dead code.

Since evaluator_revisions is now populated only with successfully resolved revisions, accessing evaluator_revisions[annotation_step_key] will either return a valid revision or raise KeyError. The if not evaluator_revision check at line 580 can never be true (assuming the KeyError issue above is fixed with a prior guard).

Consider removing this dead code block once the KeyError fix is in place.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5d5da984-8910-47e4-a251-4fd58af504c8

📥 Commits

Reviewing files that changed from the base of the PR and between fec4391 and c67ea85.

📒 Files selected for processing (2)
  • api/oss/src/core/evaluations/tasks/live.py
  • api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Evaluation size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Prevent live evaluations where query revision data is 'null'

3 participants