Skip to content

Commit 3395cf6

Browse files
leogdionclaude
andcommitted
Add test/validation mode section to skit-analyze plan
Adds comprehensive test mode specifications including: - Test case structure and organization - Test configuration formats (manifest.json, test-config.json) - Four validation strategies (structural, content, build, functional) - Implementation components overview - Usage examples and CLI flags - Benefits of automated testing for prompt validation This enables validation of Claude's code generation against known-good outputs, preventing regressions and ensuring quality. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent a25ba38 commit 3395cf6

1 file changed

Lines changed: 118 additions & 0 deletions

File tree

Docs/skit-analyze-plan.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -815,6 +815,124 @@ Include full API request/response, intermediate parsing steps, file collection d
815815
- Generated code compiles
816816
- Feature actually works as expected
817817
818+
## 8. Test/Validation Mode
819+
820+
To ensure the prompt generates correct code changes reliably, we need a test mode that validates Claude's responses against known-good outputs.
821+
822+
### Overview
823+
824+
Test mode (`--test` flag) runs the tool against a suite of test cases with predefined inputs and expected outputs, then validates that Claude's generated code matches expectations.
825+
826+
### Test Case Structure
827+
828+
```
829+
test-cases/
830+
├── subscript-basic/
831+
│ ├── input/
832+
│ │ ├── dsl.swift # Input DSL code
833+
│ │ └── expected.swift # Expected Swift output
834+
│ ├── expected-changes/
835+
│ │ ├── manifest.json # List of expected file changes
836+
│ │ └── Declarations/
837+
│ │ └── Subscript.swift # Expected new/updated file content
838+
│ └── test-config.json # Test metadata (optional)
839+
├── defer-statement/
840+
│ └── ...
841+
└── generic-function/
842+
└── ...
843+
```
844+
845+
### Test Configuration Format
846+
847+
**manifest.json** - Describes expected changes:
848+
```json
849+
{
850+
"description": "Add subscript support to SyntaxKit",
851+
"expectedNewFiles": [
852+
"Declarations/Subscript.swift",
853+
"Declarations/SubscriptParameter.swift"
854+
],
855+
"expectedUpdatedFiles": [
856+
"Core/CodeBlock.swift"
857+
],
858+
"minimumNewFiles": 1,
859+
"minimumUpdatedFiles": 0,
860+
"validationStrategy": "structural",
861+
"buildRequired": true
862+
}
863+
```
864+
865+
**test-config.json** - Test metadata (optional):
866+
```json
867+
{
868+
"name": "Subscript Basic Support",
869+
"description": "Tests basic subscript declaration generation",
870+
"tags": ["declaration", "subscript", "basic"],
871+
"priority": "high",
872+
"model": "claude-opus-4-6",
873+
"timeout": 120
874+
}
875+
```
876+
877+
### Validation Strategies
878+
879+
1. **Structural Validation** (`structural`):
880+
- Verifies expected files were created/modified
881+
- Checks file paths match expectations
882+
- Does not validate exact content
883+
884+
2. **Content Validation** (`content`):
885+
- Compares generated file content against expected content
886+
- Allows for minor whitespace/formatting differences
887+
- Validates key code structures are present
888+
889+
3. **Build Validation** (`build`):
890+
- Runs `swift build` on generated library
891+
- Verifies no compilation errors
892+
- Does not compare against expected content
893+
894+
4. **Functional Validation** (`functional`):
895+
- Runs test suite against generated library
896+
- Verifies generated code produces correct output
897+
- Requires test files in test case
898+
899+
### Implementation Components
900+
901+
See the full implementation plan in the main plan file for detailed component specifications including:
902+
- Updated AnalyzerConfiguration with test mode flags
903+
- TestRunner for orchestrating test execution
904+
- TestCaseDiscoverer for finding and loading test cases
905+
- TestValidator for validating results against expectations
906+
- TestModels for test data structures
907+
908+
### Usage
909+
910+
```bash
911+
# Run all test cases
912+
skit-analyze --test
913+
914+
# Run with verbose output
915+
skit-analyze --test --verbose
916+
917+
# Stop on first failure
918+
skit-analyze --test --test-stop-on-fail
919+
920+
# Run only tests matching "subscript"
921+
skit-analyze --test --test-filter=subscript
922+
923+
# Run tests from custom path
924+
skit-analyze --test --test-cases=custom-tests/
925+
```
926+
927+
### Benefits
928+
929+
1. **Prompt Validation**: Ensures the Claude prompt generates correct code
930+
2. **Regression Prevention**: Catches when prompt changes break existing features
931+
3. **Quality Assurance**: Validates generated code quality before deployment
932+
4. **Documentation**: Test cases serve as examples of expected behavior
933+
5. **Confidence**: Developers can iterate on prompts with confidence
934+
6. **Debugging**: Failed tests pinpoint exactly what's wrong with generated code
935+
818936
## Future Enhancements (Not in Scope)
819937
820938
- **Validation Mode**: Run `swift build` on generated code and retry if compilation fails

0 commit comments

Comments
 (0)