feat: smart LLM grader retry — reuse content and ask LLM to fix structure

## Problem

When an LLM grader returns a response that doesn't match the expected schema, the current retry logic simply re-sends the same request from scratch. After 3 failed attempts, the evaluator is skipped entirely. This wastes tokens and loses useful content that was already generated.

## Current behavior

```
⚠ LLM grader "rubrics" failed after 3 attempts
  (Failed to parse evaluator response after 3 attempts: [{"code":"invalid_type","expected":"number","received":"undefined","path":["score"]}])
```

Each retry is a full re-request. The LLM may have returned good evaluation content but with a slightly malformed structure (e.g., score as string instead of number, missing a required field).

## Proposal

After exhausting standard retries, add a "fix structure" step:

1. **Standard retries** (1-3): re-send the original request
2. **Fix structure retry**: take the last invalid response and send a new request:
   ```
   The following evaluation response has the correct content but invalid structure.
   Fix it to match this schema: { score: number, passed: boolean, reasoning: string }
   
   Invalid response:
   <paste last response>
   ```
3. Parse the fixed response

This is significantly cheaper than a full re-evaluation and recovers content that was already generated correctly.

## Prior art

- **Instructor** (Python): automatically retries with validation errors appended to the prompt
- **LangChain OutputFixingParser**: sends malformed output + error to LLM for repair
- **Zod + AI SDK**: `generateObject` with automatic schema repair

## Additional context

Also observed: workspace cwd is not preserved on CLI provider retries — the mock_agent's first attempt uses the workspace path but retries fall back to the eval directory. This is a separate bug (#911).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: smart LLM grader retry — reuse content and ask LLM to fix structure #911

Problem

Current behavior

Proposal

Prior art

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: smart LLM grader retry — reuse content and ask LLM to fix structure #911

Description

Problem

Current behavior

Proposal

Prior art

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions