[fix] Resolve issues from null data in live evaluations by junaway · Pull Request #4462 · Agenta-AI/agenta

junaway · 2026-05-27T13:47:21Z

vercel · 2026-05-27T13:47:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 27, 2026 1:48pm

coderabbitai · 2026-05-27T13:47:45Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Improved validation of query and evaluator revisions to ensure required data is present before processing.
- Enhanced handling of missing revision data with warning logs; revisions without data are now gracefully omitted from processing.
Tests
- Added comprehensive test coverage for revision resolution and data validation logic.

Walkthrough

This PR refactors the live evaluation revision resolution to extract validation logic into reusable async helpers. New functions _resolve_query_revisions and _resolve_evaluator_revisions validate fetched revisions and skip those with missing data, replacing inline loops and default-initialization behavior in evaluate_live_query.

Changes

Live Evaluation Revision Resolution

Layer / File(s)	Summary
Revision resolution helpers and imports `api/oss/src/core/evaluations/tasks/live.py`	Removes `QueryRevisionData` import and introduces `_resolve_query_revisions` and `_resolve_evaluator_revisions` async helpers that fetch revisions via services, validate required `id`, `slug`, and `data` fields, log warnings for missing data, and return only successfully resolved revisions.
Integration into evaluate_live_query `api/oss/src/core/evaluations/tasks/live.py`	Updates `evaluate_live_query` to call the new resolution helpers instead of inline fetch loops, populates reference maps only for successfully resolved step keys, and removes default DTO instantiation for missing data.
Resolution helper test coverage `api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py`	Adds tests for both query and evaluator resolution helpers verifying that revisions with null data are skipped and revisions with populated data are returned in the resolved map.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing issues caused by null data in live evaluations, which matches the refactoring work to handle missing data in revision resolution.
Description check	✅ Passed	The description references the related GitHub issue, which is appropriately minimal but meaningfully connects the PR to a specific problem being resolved.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/wild-null-data-in-live-evaluations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR addresses issue #3138 by preventing live evaluations from running when referenced query/evaluator revision records exist but have data=None, which could otherwise lead to unfiltered trace queries or non-functional evaluator workflow invocations.

Changes:

Added revision-resolution guard helpers (_resolve_query_revisions, _resolve_evaluator_revisions) that skip revisions missing required identity or data.
Updated evaluate_live_query to use the new guard helpers (removing the previous behavior that fabricated empty data objects).
Added unit tests covering the “skip null data” and “keep non-null data” behaviors for both query and evaluator revisions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`api/oss/src/core/evaluations/tasks/live.py`	Adds guard helpers and switches live evaluation to skip revisions with `data=None` rather than fabricating empty configs.
`api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py`	Introduces unit tests validating the new guard behavior for query and evaluator revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        if (
+            not query_revision
+            or not query_revision.id
+            or not query_revision.slug
+            or not query_revision.data
+        ):
+            log.warn(
+                f"Query revision with ref {query_revision_ref.model_dump(mode='json')} not found!"
+            )
+            continue


+        if (
+            not evaluator_revision
+            or not evaluator_revision.id
+            or not evaluator_revision.slug
+            or not evaluator_revision.data
+        ):
+            log.warn(
+                f"Evaluator revision with ref {evaluator_revision_ref.model_dump(mode='json')} not found!"
+            )
+            continue


+    revision_with_data = QueryRevision(
+        id=revision_id,
+        slug="my-query",
+        data=QueryRevisionData(),
+    )


+    revision_with_data = EvaluatorRevision(
+        id=revision_id,
+        slug="my-evaluator",
+        data=EvaluatorRevisionData(),
+    )


coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

api/oss/src/core/evaluations/tasks/live.py (1)

567-578: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

KeyError when accessing references for skipped evaluator revisions.

The loop iterates over all annotation_steps_keys, but evaluator_references only contains keys for successfully resolved revisions (populated at lines 360-363). If a revision was skipped due to null data, accessing evaluator_references[annotation_step_key] at line 568 will raise a KeyError before the guard at line 580 can protect against it.

🐛 Proposed fix: check for key presence or use `.get()` with guard

                 # run evaluator revisions --------------------------------------
                 for jdx in range(nof_annotations):
                     annotation_step_key = annotation_steps_keys[jdx]
                     annotation_step = annotation_steps[annotation_step_key]

                     if annotation_step.origin in {"human", "custom"}:
                         scenario_has_pending[idx] = True
                         continue

+                    # Skip if evaluator revision was not resolved (e.g., null data)
+                    if annotation_step_key not in evaluator_revisions:
+                        log.warn(
+                            f"Skipping annotation step {annotation_step_key}: evaluator revision not resolved"
+                        )
+                        scenario_has_errors[idx] += 1
+                        scenario_status[idx] = EvaluationStatus.ERRORS
+                        continue
+
                     step_status = EvaluationStatus.SUCCESS

                     references: Dict[str, Any] = {
                         **evaluator_references[annotation_step_key],
                     }

🧹 Nitpick comments (1)

api/oss/src/core/evaluations/tasks/live.py (1)

578-589: 💤 Low value

Unreachable guard: evaluator_revision falsiness check is now dead code.

Since evaluator_revisions is now populated only with successfully resolved revisions, accessing evaluator_revisions[annotation_step_key] will either return a valid revision or raise KeyError. The if not evaluator_revision check at line 580 can never be true (assuming the KeyError issue above is fixed with a prior guard).

Consider removing this dead code block once the KeyError fix is in place.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5d5da984-8910-47e4-a251-4fd58af504c8

📥 Commits

Reviewing files that changed from the base of the PR and between fec4391 and c67ea85.

📒 Files selected for processing (2)

api/oss/src/core/evaluations/tasks/live.py
api/oss/tests/pytest/unit/evaluations/test_live_evaluation_guards.py

[fix] Resolve issues from null data in live evaluations

c67ea85

Copilot AI review requested due to automatic review settings May 27, 2026 13:47

dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 27, 2026

Copilot started reviewing on behalf of junaway May 27, 2026 13:47 View session

dosubot Bot added bug Something isn't working Evaluation labels May 27, 2026

vercel Bot deployed to Preview May 27, 2026 13:48 View deployment

Copilot AI reviewed May 27, 2026

View reviewed changes

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Resolve issues from null data in live evaluations#4462

[fix] Resolve issues from null data in live evaluations#4462
junaway wants to merge 1 commit into
mainfrom
fix/wild-null-data-in-live-evaluations

junaway commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

junaway commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented May 27, 2026 •

edited

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading