Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions src/pages/docs/release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,62 @@ title: "Future AGI Release Notes: Features, Fixes, and Updates"
description: "Latest Future AGI release notes covering new features, improvements, and bug fixes across datasets, evaluations, simulation, and observability products."
---

## Week of 2026-05-21

<div class="mb-12 pb-8 border-b border-[var(--color-border-subtle)] last:border-b-0">

<div class="mt-6 mb-3 text-lg font-semibold">Features</div>

- **Composite Evals Now Work Across Trace and Sessions:** You can now run composite evaluations against traces and sessions, not just individual spans. This lets you measure performance across multi-step conversations and grouped interactions in a single evaluation run.

- **Custom Evals Run With Partial Inputs:** Custom evaluations now proceed even when some input fields are missing from your trace data, returning results with a warning indicator instead of failing entirely. System evaluators continue to require all inputs before running.

- **Custom Variable Paths in Eval Task Mappings:** When configuring an eval task, you can now type any custom attribute path from your trace data rather than choosing from a fixed list. This gives you full flexibility when mapping trace fields to eval inputs, including deeply nested attributes.

- **Dynamic API Columns Support Nested Field Paths:** When configuring a dynamic API column, you can now use dot notation to reference nested fields in the API response (for example, result.score.value). This lets you pull specific values from deeply structured API responses without needing to flatten them first.

<div class="mt-6 mb-3 text-lg font-semibold">Bugs/Improvements</div>

- **Error Feed Clusters Are Easier to Triage:** Grouped errors in the Error Feed now show concise, descriptive titles and accurate severity levels, making it much faster to understand and prioritize issues at a glance.

- **Login Errors Now Show Specific Reasons:** When sign-in fails, you now see a clear message explaining why (for example: too many attempts, account inactive, or IP access restrictions) instead of a generic error. This makes it faster to understand and resolve access issues.

- **Model Selector Is Now Available for System LLM Evaluations:** When configuring a system evaluator that uses an LLM, you can now select which model to use. The model field was previously disabled for system evaluators.

- **Evaluation Save Button Is Disabled Until All Required Fields Are Filled:** The save button for evaluations is now disabled until all required fields, including the evaluation name, are completed. The interface also provides clearer feedback when the name exceeds the allowed length.

- **Voice Recordings Now Load Correctly from Error Feed:** Voice traces opened directly from the Error Feed were showing no recording even when one existed. Recordings now load correctly when viewing a voice trace from the Error Feed.

- **Errored Evals Now Show a Clear Error Indicator:** Evaluations that encountered an error were previously shown as a blank dash in the trace and voice drawers, making it hard to tell that something had gone wrong. They now show a clear Error indicator so the status is immediately visible.

- **Filter Picker Returns Correct Results for Matching Metric Names:** In some cases, when two metrics shared the same name across different namespaces, the filter property picker returned incorrect or missing results. The picker now returns the correct metric in all cases.

- **Eval Results Now Load for All Trace Evaluations:** In some cases, evaluation results for trace-level evaluations were not appearing in the details panel even after the eval completed successfully. These results now display correctly.

- **Eval Versions Save and Restore Correctly:** In some cases, saving a new version of an evaluation template or restoring an older one could result in incorrect configuration being applied. Versioning now works reliably, and context settings persist correctly across edits.

- **Eval Creation Saves Correctly When Adding Multiple Evals:** In some cases, the Save & Add button in the eval picker stayed disabled even after completing all required steps, affecting the simulation, Observe, and dataset flows. The button now enables correctly once all steps are done, so you can save and continue adding evaluations without getting stuck.

- **Large Traces Are Now Evaluated Completely:** Previously, evaluation prompts for very large traces were cut off well before the end of the trace content, causing evaluations to run on incomplete context. The limit has been significantly increased so that large traces are fully considered during evaluation.

- **Agent Scenario Cards Show Readable Descriptions:** In some cases, scenario cards generated from replay sessions were displaying internal configuration text instead of a readable description. Cards now show a concise, human-readable description of the scenario.

- **Output Type Selector Explains When Locked:** When the output type field cannot be changed for a particular evaluation type, the interface now shows a note explaining why. The restriction is no longer silent.

- **Error Feed Linear Integration Works Reliably:** Creating a Linear ticket from the Error Feed now works consistently every time. Tickets include a direct link back to the trace and the root causes identified by deep analysis. Deep analysis results now appear within seconds instead of requiring a page refresh, and switching between traces updates the metadata panel immediately.

- **Demo Datasets Load Correctly for All New Accounts:** New accounts were sometimes set up with an empty demo dataset due to an internal configuration issue. Demo datasets now load with the correct sample data for all new registrations, and existing accounts that were affected have been restored.

- **System Evaluators Work Correctly in Open-Source Mode:** When running in open-source mode, system evaluators were failing with an error indicating the feature was unavailable. System evaluators now work correctly in open-source deployments.

- **Task List Filter Chips Display Correct Values:** In some cases, active filter chips in the task list were displaying 'undefined' instead of the actual filter value. Filter chips now show the correct value at all times.

- **Tag Input in Trace Detail Now Responds to Clicks on the Enter Icon:** In the trace detail view, clicking the Enter icon in the tag input field now correctly adds the tag. Previously, only pressing the keyboard Enter key would work.

- **Adding Evals with Number Inputs Now Works Reliably:** In some cases, adding an evaluation that included a number input field was causing unexpected errors. These errors no longer occur.

</div>

## Week of 2026-05-13

<div class="mb-12 pb-8 border-b border-[var(--color-border-subtle)] last:border-b-0">
Expand Down
Loading