CS-10675: Implement modular validation pipeline with test step by habdelra · Pull Request #4375 · cardstack/boxel

habdelra · 2026-04-09T23:17:25Z

Summary

Add a modular ValidationPipeline that runs all validation steps concurrently (Promise.allSettled) after every inner-loop agent turn
Implement TestValidationStep wrapping executeTestRunFromRealm() — reads back the completed TestRun card for detailed failure data (passedCount, failedCount, individual test failures)
NoOpStepRunner placeholders for parse, lint, evaluate, instantiate — child tickets CS-10713, CS-10714, CS-10715, CS-10716
Each step implements formatForContext() producing LLM-friendly markdown with step-specific failure shapes via details field (POJOs, not cards)
Max iterations + failing validation now blocks the issue with the reason and failure context in the description (persisted to realm via RealmIssueStore.updateIssue)
Extract fetchRealmFilenames() from realm-operations.ts for test file discovery
Updated phase-2-plan.md with validation architecture documentation

Architecture

src/validators/
  validation-pipeline.ts  — ValidationStepRunner interface + ValidationPipeline + createDefaultPipeline()
  test-step.ts            — TestValidationStep (wraps executeTestRunFromRealm)
  noop-step.ts            — NoOpStepRunner (placeholder for unimplemented steps)

Adding a new validation step = one new file + one line in createDefaultPipeline(). No pipeline code changes needed.

Try it out

Issue-loop smoke test (no server needed)

cd packages/software-factory
pnpm smoke:issue-loop

Runs 7 scenarios (22 assertions) including the new ValidationPipeline integration scenario that exercises the pipeline with all NoOp steps wired into the issue loop.

Validation pipeline smoke test with real tests (requires `mise run dev-all`)

cd packages/software-factory

MATRIX_URL=http://localhost:8008 \
MATRIX_USERNAME=your-username \
MATRIX_PASSWORD=your-password \
pnpm smoke:test-realm -- \
  --target-realm-url http://localhost:4201/your-username/smoke-test-realm/

This exercises the full ValidationPipeline with a real TestValidationStep against a live realm:

Creates the realm + auth
Writes a HelloCard definition + passing test + deliberately failing test
Creates a ValidationPipeline via createDefaultPipeline(), runs validate(targetRealmUrl), verifies:
- Pipeline reports FAILED (deliberately failing test)
- Test step has details.passedCount > 0 and details.failedCount > 0
- All NoOp steps (parse, lint, evaluate, instantiate) passed
- formatForContext() produces readable markdown showing test failure details

What to expect in the Boxel app:

Navigate to your smoke-test-realm workspace — you'll see the HelloCard definition, co-located tests, and a TestRun card in Test Runs/ showing the validation pipeline's test execution results with pass/fail details.

Linear tickets

CS-10675 — Main ticket (this PR)
CS-10713 — Implement parse validation step
CS-10714 — Implement lint validation step
CS-10715 — Implement module evaluation validation step
CS-10716 — Implement card instantiation validation step

Test plan

pnpm lint passes (all packages)
pnpm test in software-factory — 450 tests pass (23 new)
pnpm smoke:issue-loop — 7 scenarios, 22 assertions pass
pnpm smoke:test-realm — validation pipeline with real realm (requires dev-all)
Existing issue-loop tests updated for new max-iterations→blocked behavior

🤖 Generated with Claude Code

Integrate findings from CS-10672 implementation directly into the relevant plan sections rather than appending notes: - Selection algorithm uses backlog (not ready) to match darkfactory.gts - Priority enum includes critical - Issue lifecycle uses backlog → in_progress → done/blocked/review - Exhausted issues tracked via exclusion set to prevent re-picking - RealmIssueStore uses absolute darkfactory module URL (env-sensitive) - Boxel linksToMany uses dotted-key format for blockedBy relationships - Loop outcome disambiguation table for edge cases - Migration path references actual file organization and CS-10708 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a modular validation pipeline that runs after every inner-loop agent turn. The pipeline runs all steps concurrently via Promise.allSettled() and aggregates results. Each step is a separate module implementing the ValidationStepRunner interface — adding a new step is one file plus one line in createDefaultPipeline(). Key changes: - ValidationPipeline class implementing the Validator interface with concurrent step execution and per-step exception capture - TestValidationStep wrapping executeTestRunFromRealm() with detailed failure data read back from the completed TestRun card - NoOpStepRunner placeholders for parse, lint, evaluate, instantiate (child tickets CS-10713 through CS-10716) - Step-specific failure shapes via details field on ValidationStepResult - formatForContext() on each step for LLM-friendly failure output - Max iterations + failing validation now blocks the issue with the reason and failure context in the issue description - Extract fetchRealmFilenames() from pullRealmFiles() for reuse - Optional updateIssue() on IssueStore for max-iteration blocking - Updated phase-2-plan.md with validation architecture documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f961346e81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/software-factory/src/validators/test-step.ts

packages/software-factory/src/issue-loop.ts

Copilot

Pull request overview

Implements a modular, concurrent validation pipeline for the software-factory issue loop, including a concrete test-validation step that executes realm tests and reads back detailed TestRun results. It also updates the issue-loop behavior so max-iteration exhaustion can block issues when validation is still failing, and extracts _mtimes filename fetching for reuse.

Changes:

Add ValidationPipeline + ValidationStepRunner abstraction with a default 5-step pipeline (4 placeholders + test step).
Implement TestValidationStep that wraps executeTestRunFromRealm() and extracts detailed failures from the TestRun card into ValidationStepResult.details.
Update issue loop behavior/tests/smoke tests and refactor realm operations by extracting fetchRealmFilenames() from pullRealmFiles().

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
packages/software-factory/src/validators/validation-pipeline.ts	New concurrent validation pipeline + default step composition.
packages/software-factory/src/validators/test-step.ts	New test validation step that runs tests and reads back detailed TestRun info.
packages/software-factory/src/validators/noop-step.ts	Placeholder step runner for unimplemented validation stages.
packages/software-factory/src/issue-loop.ts	Integrates validator formatting and blocks issues on max-iterations + failing validation.
packages/software-factory/src/issue-scheduler.ts	Extends `IssueStore` with optional `updateIssue` to support blocking updates.
packages/software-factory/src/realm-operations.ts	Extracts `fetchRealmFilenames()` and refactors `pullRealmFiles()` to reuse it.
packages/software-factory/src/factory-agent-types.ts	Extends `ValidationStepResult` with `details` for step-specific structured context.
packages/software-factory/tests/validation-pipeline.test.ts	Adds unit tests for pipeline aggregation/formatting and NoOp step behavior.
packages/software-factory/tests/test-step.test.ts	Adds unit tests for `TestValidationStep` behavior and formatting.
packages/software-factory/tests/issue-loop.test.ts	Updates/adds tests for new “block on max iterations + failing validation” behavior.
packages/software-factory/tests/index.ts	Registers new test files in the test suite entrypoint.
packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts	Extends smoke test to exercise the full validation pipeline against a live realm.
packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts	Adds a scenario validating pipeline integration without a server.
packages/software-factory/realm/tsconfig.json	Adds a strict TS config for realm-local TS/GTS compilation.
packages/software-factory/docs/phase-2-plan.md	Documents validation architecture and updated scheduling/loop semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/src/issue-loop.ts

packages/software-factory/src/issue-scheduler.ts

packages/software-factory/tests/validation-pipeline.test.ts

packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts

…es is standalone pullRealmFiles is dead code (only used in its own tests) and doesn't need to be refactored. fetchRealmFilenames stands on its own for the test validation step's file discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ersistence warning - Fix test-step error mapping: fall back to handle.errorMessage when details.failures is empty (infrastructure errors with readable TestRun) - Remove incorrect "no-op" log suffix in smoke test - Improve createDefaultPipeline test to actually verify 5 steps in order - Add warning log when IssueStore lacks updateIssue for max-iteration blocking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… lint - Make IssueStore.updateIssue required (not optional) - Implement updateIssue on RealmIssueStore: read card, mutate status/description, write back to realm - Update all MockIssueStore implementations to include updateIssue - Fix prettier formatting and non-null assertion lint errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update orchestrator pseudocode to reflect max-iterations→blocked behavior - Document IssueStore.updateIssue and orchestrator realm writes - Update Issue Lifecycle section with max-iteration blocking - Clean up --jwt reference in boxel-cli auth section - Fix NoOpStepRunner to declare parameter names (TS strict mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The validation pipeline's TestValidationStep already exercises executeTestRunFromRealm internally, so the separate direct call was redundant. The smoke test now goes straight from writing test files to running the ValidationPipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra and others added 3 commits April 9, 2026 18:24

glint support for software factory realm gts

97cd9f2

habdelra requested a review from Copilot April 9, 2026 23:21

Copilot started reviewing on behalf of habdelra April 9, 2026 23:22 View session

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

packages/software-factory/src/validators/test-step.ts Outdated Show resolved Hide resolved

packages/software-factory/src/issue-loop.ts Outdated Show resolved Hide resolved

Copilot AI reviewed Apr 9, 2026

View reviewed changes

habdelra and others added 4 commits April 9, 2026 19:28

habdelra requested a review from a team April 10, 2026 00:06

habdelra and others added 2 commits April 9, 2026 20:11

Fix setup-logger import path in smoke-test-realm.ts

ef78431

The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CS-10675: Implement modular validation pipeline with test step#4375

CS-10675: Implement modular validation pipeline with test step#4375
habdelra wants to merge 9 commits intomainfrom
cs-10675-validation-pipeline

habdelra commented Apr 9, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

habdelra commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Try it out

Issue-loop smoke test (no server needed)

Validation pipeline smoke test with real tests (requires mise run dev-all)

Linear tickets

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

habdelra commented Apr 9, 2026 •

edited

Loading

Validation pipeline smoke test with real tests (requires `mise run dev-all`)