Skip to content

feat: smart LLM grader retry — reuse content and ask LLM to fix structure #911

@christso

Description

@christso

Problem

When an LLM grader returns a response that doesn't match the expected schema, the current retry logic simply re-sends the same request from scratch. After 3 failed attempts, the evaluator is skipped entirely. This wastes tokens and loses useful content that was already generated.

Current behavior

⚠ LLM grader "rubrics" failed after 3 attempts
  (Failed to parse evaluator response after 3 attempts: [{"code":"invalid_type","expected":"number","received":"undefined","path":["score"]}])

Each retry is a full re-request. The LLM may have returned good evaluation content but with a slightly malformed structure (e.g., score as string instead of number, missing a required field).

Proposal

After exhausting standard retries, add a "fix structure" step:

  1. Standard retries (1-3): re-send the original request
  2. Fix structure retry: take the last invalid response and send a new request:
    The following evaluation response has the correct content but invalid structure.
    Fix it to match this schema: { score: number, passed: boolean, reasoning: string }
    
    Invalid response:
    <paste last response>
    
  3. Parse the fixed response

This is significantly cheaper than a full re-evaluation and recovers content that was already generated correctly.

Prior art

  • Instructor (Python): automatically retries with validation errors appended to the prompt
  • LangChain OutputFixingParser: sends malformed output + error to LLM for repair
  • Zod + AI SDK: generateObject with automatic schema repair

Additional context

Also observed: workspace cwd is not preserved on CLI provider retries — the mock_agent's first attempt uses the workspace path but retries fall back to the eval directory. This is a separate bug (#911).

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of AgentVenhancementNew feature or request

Type

No type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions