Skip to content

feat: add workflow extraction Pydantic models, WAA adapter, and matching pipeline#142

Merged
abrichr merged 1 commit intomainfrom
feat/workflow-extraction-models
Mar 19, 2026
Merged

feat: add workflow extraction Pydantic models, WAA adapter, and matching pipeline#142
abrichr merged 1 commit intomainfrom
feat/workflow-extraction-models

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 19, 2026

Summary

  • Add openadapt_evals/workflow/ package implementing Priority 1 of the workflow extraction pipeline (see docs/design/workflow_extraction_pipeline.md)
  • Pydantic models in models.py: RecordingSource, ActionType, NormalizedAction, RecordingSession, TranscriptEntry, EpisodeTranscript, WorkflowStep, Workflow, WorkflowInstance, CanonicalWorkflow, WorkflowLibrary -- all with computed fields, content hashing, and to_demo_text() for DemoController compatibility
  • WAA recording adapter in adapters/waa.py: WAARecordingAdapter.from_meta_json() parses WAA meta.json + step PNGs into normalized RecordingSession objects with action type classification
  • Cosine similarity matching in pipeline/match.py: match_workflow_to_canonical(), create_canonical_from_workflow(), add_instance_to_canonical() with 0.85 threshold
  • 31 tests in tests/test_workflow_models.py using four synthetic data families (A: settings toggles, B: spreadsheet data entry, C: document formatting, D: file archiving) validating models, adapter parsing, content hash determinism, demo text format, and matching behavior

Test plan

  • All 31 tests pass (0.29s)
  • Verified real WAA recording parsing (0e763496 font change task)
  • Verified four families form exactly 4 canonical groups with correct instance counts [1, 2, 3, 3]
  • Verified similar workflows match (cosine sim > 0.85) and different workflows separate
  • CI test check must pass

🤖 Generated with Claude Code

…ing pipeline

Implement Priority 1 of the workflow extraction pipeline:
- Pydantic models for RecordingSession, Workflow, CanonicalWorkflow, WorkflowLibrary
- WAARecordingAdapter to parse WAA meta.json recordings into normalized sessions
- Cosine similarity matching for grouping workflows into canonical workflows
- 31 tests with synthetic data families (settings toggles, spreadsheet entry,
  document formatting, file archiving) validating models, adapter, and matching

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit fea40b6 into main Mar 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant