Skip to content

CS-10675: Implement modular validation pipeline with test step#4375

Open
habdelra wants to merge 9 commits intomainfrom
cs-10675-validation-pipeline
Open

CS-10675: Implement modular validation pipeline with test step#4375
habdelra wants to merge 9 commits intomainfrom
cs-10675-validation-pipeline

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented Apr 9, 2026

Summary

  • Add a modular ValidationPipeline that runs all validation steps concurrently (Promise.allSettled) after every inner-loop agent turn
  • Implement TestValidationStep wrapping executeTestRunFromRealm() — reads back the completed TestRun card for detailed failure data (passedCount, failedCount, individual test failures)
  • NoOpStepRunner placeholders for parse, lint, evaluate, instantiate — child tickets CS-10713, CS-10714, CS-10715, CS-10716
  • Each step implements formatForContext() producing LLM-friendly markdown with step-specific failure shapes via details field (POJOs, not cards)
  • Max iterations + failing validation now blocks the issue with the reason and failure context in the description (persisted to realm via RealmIssueStore.updateIssue)
  • Extract fetchRealmFilenames() from realm-operations.ts for test file discovery
  • Updated phase-2-plan.md with validation architecture documentation

Architecture

src/validators/
  validation-pipeline.ts  — ValidationStepRunner interface + ValidationPipeline + createDefaultPipeline()
  test-step.ts            — TestValidationStep (wraps executeTestRunFromRealm)
  noop-step.ts            — NoOpStepRunner (placeholder for unimplemented steps)

Adding a new validation step = one new file + one line in createDefaultPipeline(). No pipeline code changes needed.

Try it out

Issue-loop smoke test (no server needed)

cd packages/software-factory
pnpm smoke:issue-loop

Runs 7 scenarios (22 assertions) including the new ValidationPipeline integration scenario that exercises the pipeline with all NoOp steps wired into the issue loop.

Validation pipeline smoke test with real tests (requires mise run dev-all)

cd packages/software-factory

MATRIX_URL=http://localhost:8008 \
MATRIX_USERNAME=your-username \
MATRIX_PASSWORD=your-password \
pnpm smoke:test-realm -- \
  --target-realm-url http://localhost:4201/your-username/smoke-test-realm/

This exercises the full ValidationPipeline with a real TestValidationStep against a live realm:

  1. Creates the realm + auth
  2. Writes a HelloCard definition + passing test + deliberately failing test
  3. Creates a ValidationPipeline via createDefaultPipeline(), runs validate(targetRealmUrl), verifies:
    • Pipeline reports FAILED (deliberately failing test)
    • Test step has details.passedCount > 0 and details.failedCount > 0
    • All NoOp steps (parse, lint, evaluate, instantiate) passed
    • formatForContext() produces readable markdown showing test failure details

What to expect in the Boxel app:

Navigate to your smoke-test-realm workspace — you'll see the HelloCard definition, co-located tests, and a TestRun card in Test Runs/ showing the validation pipeline's test execution results with pass/fail details.

Linear tickets

  • CS-10675 — Main ticket (this PR)
  • CS-10713 — Implement parse validation step
  • CS-10714 — Implement lint validation step
  • CS-10715 — Implement module evaluation validation step
  • CS-10716 — Implement card instantiation validation step

Test plan

  • pnpm lint passes (all packages)
  • pnpm test in software-factory — 450 tests pass (23 new)
  • pnpm smoke:issue-loop — 7 scenarios, 22 assertions pass
  • pnpm smoke:test-realm — validation pipeline with real realm (requires dev-all)
  • Existing issue-loop tests updated for new max-iterations→blocked behavior

🤖 Generated with Claude Code

habdelra and others added 3 commits April 9, 2026 18:24
Integrate findings from CS-10672 implementation directly into the
relevant plan sections rather than appending notes:

- Selection algorithm uses backlog (not ready) to match darkfactory.gts
- Priority enum includes critical
- Issue lifecycle uses backlog → in_progress → done/blocked/review
- Exhausted issues tracked via exclusion set to prevent re-picking
- RealmIssueStore uses absolute darkfactory module URL (env-sensitive)
- Boxel linksToMany uses dotted-key format for blockedBy relationships
- Loop outcome disambiguation table for edge cases
- Migration path references actual file organization and CS-10708

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a modular validation pipeline that runs after every inner-loop agent
turn. The pipeline runs all steps concurrently via Promise.allSettled()
and aggregates results. Each step is a separate module implementing the
ValidationStepRunner interface — adding a new step is one file plus one
line in createDefaultPipeline().

Key changes:
- ValidationPipeline class implementing the Validator interface with
  concurrent step execution and per-step exception capture
- TestValidationStep wrapping executeTestRunFromRealm() with detailed
  failure data read back from the completed TestRun card
- NoOpStepRunner placeholders for parse, lint, evaluate, instantiate
  (child tickets CS-10713 through CS-10716)
- Step-specific failure shapes via details field on ValidationStepResult
- formatForContext() on each step for LLM-friendly failure output
- Max iterations + failing validation now blocks the issue with the
  reason and failure context in the issue description
- Extract fetchRealmFilenames() from pullRealmFiles() for reuse
- Optional updateIssue() on IssueStore for max-iteration blocking
- Updated phase-2-plan.md with validation architecture documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f961346e81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a modular, concurrent validation pipeline for the software-factory issue loop, including a concrete test-validation step that executes realm tests and reads back detailed TestRun results. It also updates the issue-loop behavior so max-iteration exhaustion can block issues when validation is still failing, and extracts _mtimes filename fetching for reuse.

Changes:

  • Add ValidationPipeline + ValidationStepRunner abstraction with a default 5-step pipeline (4 placeholders + test step).
  • Implement TestValidationStep that wraps executeTestRunFromRealm() and extracts detailed failures from the TestRun card into ValidationStepResult.details.
  • Update issue loop behavior/tests/smoke tests and refactor realm operations by extracting fetchRealmFilenames() from pullRealmFiles().

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/software-factory/src/validators/validation-pipeline.ts New concurrent validation pipeline + default step composition.
packages/software-factory/src/validators/test-step.ts New test validation step that runs tests and reads back detailed TestRun info.
packages/software-factory/src/validators/noop-step.ts Placeholder step runner for unimplemented validation stages.
packages/software-factory/src/issue-loop.ts Integrates validator formatting and blocks issues on max-iterations + failing validation.
packages/software-factory/src/issue-scheduler.ts Extends IssueStore with optional updateIssue to support blocking updates.
packages/software-factory/src/realm-operations.ts Extracts fetchRealmFilenames() and refactors pullRealmFiles() to reuse it.
packages/software-factory/src/factory-agent-types.ts Extends ValidationStepResult with details for step-specific structured context.
packages/software-factory/tests/validation-pipeline.test.ts Adds unit tests for pipeline aggregation/formatting and NoOp step behavior.
packages/software-factory/tests/test-step.test.ts Adds unit tests for TestValidationStep behavior and formatting.
packages/software-factory/tests/issue-loop.test.ts Updates/adds tests for new “block on max iterations + failing validation” behavior.
packages/software-factory/tests/index.ts Registers new test files in the test suite entrypoint.
packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts Extends smoke test to exercise the full validation pipeline against a live realm.
packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts Adds a scenario validating pipeline integration without a server.
packages/software-factory/realm/tsconfig.json Adds a strict TS config for realm-local TS/GTS compilation.
packages/software-factory/docs/phase-2-plan.md Documents validation architecture and updated scheduling/loop semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

habdelra and others added 4 commits April 9, 2026 19:28
…es is standalone

pullRealmFiles is dead code (only used in its own tests) and doesn't
need to be refactored. fetchRealmFilenames stands on its own for the
test validation step's file discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ersistence warning

- Fix test-step error mapping: fall back to handle.errorMessage when
  details.failures is empty (infrastructure errors with readable TestRun)
- Remove incorrect "no-op" log suffix in smoke test
- Improve createDefaultPipeline test to actually verify 5 steps in order
- Add warning log when IssueStore lacks updateIssue for max-iteration blocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lint

- Make IssueStore.updateIssue required (not optional)
- Implement updateIssue on RealmIssueStore: read card, mutate
  status/description, write back to realm
- Update all MockIssueStore implementations to include updateIssue
- Fix prettier formatting and non-null assertion lint errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update orchestrator pseudocode to reflect max-iterations→blocked behavior
- Document IssueStore.updateIssue and orchestrator realm writes
- Update Issue Lifecycle section with max-iteration blocking
- Clean up --jwt reference in boxel-cli auth section
- Fix NoOpStepRunner to declare parameter names (TS strict mode)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team April 10, 2026 00:06
habdelra and others added 2 commits April 9, 2026 20:11
The import was ../setup-logger (resolving to scripts/setup-logger which
doesn't exist). Fixed to ../../src/setup-logger matching all other
smoke test scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The validation pipeline's TestValidationStep already exercises
executeTestRunFromRealm internally, so the separate direct call
was redundant. The smoke test now goes straight from writing test
files to running the ValidationPipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants