Skip to content

bug(pipeline): llm_grader_results/ not pre-created, risks silent zero-score regression #1152

@christso

Description

@christso

Objective

pipeline input writes <test-id>/llm_graders/<name>.json and <test-id>/code_graders/<name>.json, but no parent llm_grader_results/ directory exists until a grader subagent creates one during Phase 2. In contrast, code_grader_results/ is created lazily by its writer — see apps/cli/src/commands/pipeline/grade.ts:296 and apps/cli/src/commands/pipeline/run.ts:363 (both mkdir before writing).

No such pipeline-side writer exists for llm_grader_results/ — the grader subagent is the writer. So the responsibility for ensuring the parent directory exists shifts to the subagent, which may not think to mkdir -p first.

Failure chain when subagent skips mkdir

  1. Grader subagent calls writeFile('<bench-dir>/<test-id>/llm_grader_results/<name>.json', ...) — throws ENOENT.
  2. The throw is captured inside the subagent's own output; never surfaced to the orchestrator as a pipeline error.
  3. The result file never lands on disk.
  4. pipeline bench tries to read llm_grader_results/ at bench.ts:80-111 — the try/catch swallows the missing directory silently.
  5. pass_rate=0 is reported with no grader results merged.

This is the same user-visible symptom #1148 fixed, via a different mechanism. We reproduced step (1) during PR #1151 e2e; the rest of the chain is from reading the code.

Design latitude

  1. Pre-create in pipeline input — add a symmetric mkdir(llmResultsDir, { recursive: true }) next to the existing mkdir(llmGradersDir, ...) at apps/cli/src/commands/pipeline/input.ts:244. Removes the requirement on subagents.
  2. Document the requirement in grader.md Step 9 — explicitly tell the subagent to mkdir -p the parent dir before writing.

Option 1 is more robust (doesn't rely on subagent discipline) and simpler to enforce. Option 2 is lower-risk if there's a reason not to pre-create.

Acceptance signals

  • A fresh pipeline input produces an empty llm_grader_results/ directory next to llm_graders/ for every test with LLM graders (if Option 1), OR grader.md Step 9 has an unambiguous mkdir -p instruction (if Option 2).
  • A grader subagent that writes <bench-dir>/<test-id>/llm_grader_results/<name>.json without manually creating the parent dir does not produce a silent zero-score.
  • Regression test (if Option 1): an assertion in apps/cli/test/commands/eval/pipeline/input.test.ts that llm_grader_results/ exists after pipeline input on a fixture with at least one llm-grader assertion.

Non-goals

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions