-
Notifications
You must be signed in to change notification settings - Fork 0
feat: smart LLM grader retry — reuse content and ask LLM to fix structure #911
Description
Problem
When an LLM grader returns a response that doesn't match the expected schema, the current retry logic simply re-sends the same request from scratch. After 3 failed attempts, the evaluator is skipped entirely. This wastes tokens and loses useful content that was already generated.
Current behavior
⚠ LLM grader "rubrics" failed after 3 attempts
(Failed to parse evaluator response after 3 attempts: [{"code":"invalid_type","expected":"number","received":"undefined","path":["score"]}])
Each retry is a full re-request. The LLM may have returned good evaluation content but with a slightly malformed structure (e.g., score as string instead of number, missing a required field).
Proposal
After exhausting standard retries, add a "fix structure" step:
- Standard retries (1-3): re-send the original request
- Fix structure retry: take the last invalid response and send a new request:
The following evaluation response has the correct content but invalid structure. Fix it to match this schema: { score: number, passed: boolean, reasoning: string } Invalid response: <paste last response> - Parse the fixed response
This is significantly cheaper than a full re-evaluation and recovers content that was already generated correctly.
Prior art
- Instructor (Python): automatically retries with validation errors appended to the prompt
- LangChain OutputFixingParser: sends malformed output + error to LLM for repair
- Zod + AI SDK:
generateObjectwith automatic schema repair
Additional context
Also observed: workspace cwd is not preserved on CLI provider retries — the mock_agent's first attempt uses the workspace path but retries fall back to the eval directory. This is a separate bug (#911).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status