PolicyEngine · anth-volk · Jun 2, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,20 @@
+# Copilot Instructions
+
+Follow the repository's canonical AI-facing engineering guidance in `AGENTS.md`
+and `docs/engineering/skills/`.
+
+For pull requests, read `docs/engineering/skills/github-prs.md` before opening,
+replacing, or sharing a PR. Every PR must include a changelog fragment, and lint
+plus formatting must run before committing.
+
+For tests, read `docs/engineering/skills/testing.md` before adding, moving, or
+reviewing test files.
+
+For UK model formulas, parameters, reforms, enums, and program metadata, read
+`docs/engineering/skills/model-structure.md`.
+
+For dataset source handling, local H5 files, default datasets, and upstream
+materialization behavior, read `docs/engineering/skills/dataset-sources.md`.
+
+For documentation-sensitive changes, read
+`docs/engineering/skills/documentation-review.md`.
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,39 @@
+# Agent Instructions
+
+These instructions apply repository-wide.
+
+## Canonical Guidance
+
+Canonical AI-facing engineering guidance lives under
+`docs/engineering/skills/`. Use those files as the source of truth across
+Codex, Claude, Copilot, and other AI tools.
+
+Tool-specific instruction files such as `CLAUDE.md` and
+`.github/copilot-instructions.md` should stay thin and point here instead of
+duplicating repository rules.
+
+## Required Skill Lookup
+
+Before opening, replacing, or sharing a pull request, read
+`docs/engineering/skills/github-prs.md`.
+
+When adding, moving, or reviewing tests, read
+`docs/engineering/skills/testing.md`.
+
+When adding or changing variables, parameters, reforms, enums, or program
+metadata, read `docs/engineering/skills/model-structure.md`.
+
+When changing simulation dataset loading, default datasets, H5 inputs, or
+interactions with upstream dataset materializers, read
+`docs/engineering/skills/dataset-sources.md`.
+
+When changing public behavior, documentation, metadata surfaces, or generated
+documentation, read `docs/engineering/skills/documentation-review.md`.
+
+## Non-Negotiable PR Requirements
+
+Every pull request must include a Towncrier changelog fragment under
+`changelog.d/`. See `docs/engineering/skills/github-prs.md` for the naming and
+type rules.
+
+Run lint and formatting before committing. The default command is `make format`.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,77 +1,39 @@
-# Development Guidelines for PolicyEngine UK
+# Claude Instructions
 
-## Git workflow
-- **Always branch from `main`**: `git checkout main && git pull origin main && git checkout -b your-branch`
-- Default branch is `main` (not `master`)
+These instructions apply repository-wide.
 
-## Build commands
-- Install: `make install` or `pip install -e ".[dev]" --config-settings editable_mode=compat`
-- Format code: `make format` or `ruff format .`
-- Run all tests: `make test`
-- Run single test: `pytest policyengine_uk/tests/path/to/test_file.py::test_function -v`
-- Generate documentation: `make documentation`
-- Update changelog: `make changelog`
+## Canonical Guidance
 
-## Code Standards
-- **Formatting**: Use Ruff formatter with 88-character line length
-- **Imports**: Group imports by stdlib, third-party, local with each group alphabetized
-- **Naming**: Use snake_case for variables/functions, CamelCase for classes
-- **Type Hints**: Use Python type hints where possible
-- **Error Handling**: Handle exceptions gracefully with appropriate error messages
-- **Testing**: Write tests for all new functionality
-- **Documentation**: Document all public functions with docstrings
-- **Versioning**: Follow SemVer (increment patch for fixes, minor for features, major for breaking changes)
-- **Pull Requests**: Must include a changelog.d/ file describing the changes (GitHub Actions will automatically run `make changelog` to update the changelog)
+Repository-wide AI-facing engineering guidance lives in `AGENTS.md`.
+Canonical skills live under `docs/engineering/skills/`.
 
-## Program registry (programs.yaml)
-- `policyengine_uk/programs.yaml` is the single source of truth for program coverage metadata
-- Served via the `/uk/metadata` API and consumed by the model coverage page
-- **When adding a new program**: add an entry with `id`, `name`, `full_name`, `category`, `agency`, `status`, `coverage`, `variable`, `parameter_prefix`
-- **When extending year coverage**: update `verified_years` after verifying parameters and tests cover the new year
-- **When adding devolved programs**: use the appropriate agency (e.g., `Revenue Scotland`, `Social Security Scotland`)
-- **Status values**: `complete`, `partial`, `in_progress`
+Use those files as the source of truth. This file is a Claude adapter and should
+stay thin; do not duplicate detailed testing, CI, formatting, pull request,
+dataset, or model-structure rules here.
 
-## Repository Structure
-- **parameters/**: YAML files that define tax rates, thresholds, and other policy parameters
-  - Organized by government department (e.g., gov/hmrc/, gov/dwp/, gov/dfe/)
-  - Parameter labels should be lowercase and concise
-  - Each parameter file should include relevant legislative references
-- **variables/**: Python files that define the formulas for calculating tax and benefit amounts
-  - Organized to match the same structure as parameters/
-  - Comments should include relevant regulatory citations and calculation logic
-- **tests/**: Test cases for validating correct implementation of policies
-- For government departments, use the correct department for the policy (e.g., education policies under DfE, not DWP)
+## Required Skill Lookup
 
-## Lessons Learned from Variable Refactoring (2025-05-31)
+Before opening, replacing, or sharing a PR, read
+`docs/engineering/skills/github-prs.md`.
 
-### Issue: Refactoring Script Bug
-A refactoring script was used to split multi-variable Python files into single-variable files. The script had a systematic bug: when a file contained multiple Variable classes and one had the same name as the file, that variable was dropped entirely.
+When adding, moving, or reviewing tests, read
+`docs/engineering/skills/testing.md`.
 
-### Variables Dropped During Refactoring
-21 variables were dropped, including:
-- Wrapper variables: `jsa_income`, `esa_income`, `jsa_contrib`, `esa_contrib`, `afcs`, `bsp`, `iidb`
-- Calculated variables: `benefit_cap`, `carers_allowance`, `income_support`, `maternity_allowance`, `sda`, `tax_credits`
-- Core variables: `child_benefit`, `allowances`, `marriage_allowance`, `stamp_duty_land_tax`, `vat`, `land_transaction_tax`, `attendance_allowance`, `council_tax_benefit`, `business_rates`, `tax`, `total_wealth`, `private_school_vat`, `bi_phaseout`
+When modifying variables, parameters, reforms, enums, program metadata, or UK
+model structure, read `docs/engineering/skills/model-structure.md`.
 
-### Enum Recovery Errors
-During recovery, enums were recreated based on assumptions rather than checking original code, leading to incorrect values:
-- **TenureType**: Missing `OWNED_OUTRIGHT` and `OWNED_WITH_MORTGAGE` (consolidated into `OWNER_OCCUPIED`)
-- **AccommodationType**: Wrong labels (e.g., "House - detached" vs "Detached house")
-- **EducationType**: Wrong capitalization ("Lower secondary" vs "Lower Secondary")
-- **FamilyType**: Shortened labels (missing ", with children" suffix)
-- **EmploymentStatus**: Complete restructure (missing FT_/PT_ prefixes)
-- **MinimumWageCategory**: Wrong format ("18-20" vs "18 to 20", "Over 24" vs "25 or over")
-- **CouncilTaxBand**: Added incorrect "Band " prefix
-- **StatePensionType**: Wrong capitalization ("Basic" vs "basic")
+When modifying simulation dataset loading, default datasets, H5 inputs, or
+upstream materialized dataset paths, read
+`docs/engineering/skills/dataset-sources.md`.
 
-### Best Practices for Refactoring Recovery
-1. **Always check original code** - Never rely on assumptions or context clues
-2. **Use git history systematically** - `git show <commit>:path/to/file` to see original content
-3. **Verify enum values exactly** - Even small differences in labels can break functionality
-4. **Test incrementally** - Run tests after each fix to ensure progress
-5. **Document the recovery process** - Track what was fixed for future reference
+When changing public behavior, metadata, docs, or generated docs, read
+`docs/engineering/skills/documentation-review.md`.
 
-### OpenFisca-Specific Patterns
-- Use `.possible_values` to access enum values, not direct imports
-- Import helper functions like `find_freeze_start` when needed
-- Functions like `ceil` should be `np.ceil` in OpenFisca context
+## Safety Boundaries
+
+Do not fabricate policy sources, validation metrics, or performance claims. If a
+result has not been computed from code or cited from a source, mark it as
+pending.
+
+Do not upload, promote, publish, or overwrite data artifacts unless the user
+explicitly asks for that operation.
diff --git a/changelog.d/1748.added.md b/changelog.d/1748.added.md
@@ -0,0 +1 @@
+- Added a model-agnostic AI engineering harness that routes tool-specific agent guidance into shared docs.
diff --git a/docs/engineering/skills/README.md b/docs/engineering/skills/README.md
@@ -0,0 +1,24 @@
+# Engineering Skills
+
+This directory is the canonical source for AI-facing engineering rules.
+
+Tool-specific instruction files such as `AGENTS.md`, `CLAUDE.md`, and
+`.github/copilot-instructions.md` should point here instead of duplicating
+implementation-specific guidance. When a rule changes, update the skill here
+first, then keep adapters thin.
+
+Current skills:
+
+- `dataset-sources.md`: default datasets, HuggingFace dataset URLs, local H5
+  file paths, and upstream dataset materialization behavior.
+- `documentation-review.md`: model-neutral review checks for public behavior,
+  docs, metadata, and generated documentation.
+- `github-prs.md`: issue-first PR workflow, required Towncrier changelog
+  fragments, draft PR expectations, and pre-commit lint/format rules.
+- `model-structure.md`: UK variables, parameters, reforms, enums, program
+  registry metadata, and refactoring recovery lessons.
+- `testing.md`: UK test layout, focused test selection, slow microsimulation
+  boundaries, and expected commands.
+
+Use the narrowest skill that matches the task. When a task spans multiple
+areas, read each relevant skill before editing.
diff --git a/docs/engineering/skills/dataset-sources.md b/docs/engineering/skills/dataset-sources.md
@@ -0,0 +1,57 @@
+# Dataset Sources
+
+Use this skill when changing simulation dataset loading, default datasets, H5
+inputs, or interactions with upstream dataset materializers.
+
+## Supported Source Shapes
+
+`policyengine_uk.Simulation` and `Microsimulation` support several data inputs:
+
+- explicit situations for single-household calculator-style simulations;
+- pandas DataFrames;
+- `policyengine_core.data.Dataset` instances;
+- UK single-year and multi-year dataset schema objects;
+- HuggingFace dataset URLs using `hf://...`;
+- local H5 file paths that have already been materialized by an upstream
+  resolver.
+
+Do not treat every string as a remote URL. Local file paths are valid dataset
+sources when an upstream package has downloaded or materialized a remote dataset
+to disk.
+
+## Remote Datasets
+
+UK directly supports `hf://` dataset URLs through the existing HuggingFace
+download path.
+
+Other remote schemes, including `gs://`, should be materialized before being
+passed into UK unless UK has explicit scheme support. The `.py` bundle/runtime
+can resolve a manifest entry, materialize a remote dataset source, and then pass
+the resulting local H5 path into `policyengine_uk`.
+
+When debugging dataset failures, distinguish these stages:
+
+1. manifest or caller chooses the dataset reference;
+2. upstream resolver materializes remote references such as GCS to a local path;
+3. UK loads the local H5 path and validates whether it is a single-year,
+   multi-year, or core dataset file.
+
+Errors at those stages require different fixes.
+
+## Defaults
+
+`policyengine_uk` does not silently choose a default dataset when no situation is
+provided. Callers must pass `dataset=...` or opt into a default through the
+configured environment variable.
+
+Keep default dataset behavior explicit so calculator-style simulations,
+microsimulations, and API workers do not accidentally diverge.
+
+## Tests
+
+For dataset source routing, prefer stubbed tests that prove dispatch behavior
+without downloading or opening private survey files.
+
+Use real H5 files only when the test specifically needs schema or data behavior.
+If the artifact is private, large, or credential-dependent, skip cleanly or keep
+the verification outside the ordinary PR test suite.
diff --git a/docs/engineering/skills/documentation-review.md b/docs/engineering/skills/documentation-review.md
@@ -0,0 +1,55 @@
+# Documentation Review
+
+Use this skill when reviewing changes to public behavior, metadata surfaces,
+documentation, generated docs, or repository guidance.
+
+## Review Goal
+
+Documentation review is a harness check, not copyediting. Confirm that durable
+documentation still describes the code paths a maintainer or AI agent would use
+to understand, validate, or modify the model.
+
+Keep durable facts in source documentation. Put PR-specific confidence, impact,
+or reviewer notes in the PR description or review comments instead.
+
+## Trigger Areas
+
+Run documentation review when a PR changes:
+
+- public simulation behavior;
+- variable or parameter behavior that changes user-visible policy results;
+- `policyengine_uk/programs.yaml`;
+- dataset source handling or default dataset behavior;
+- generated or published docs under `docs/`;
+- API-facing metadata consumed by downstream applications;
+- AI-facing engineering guidance under `AGENTS.md`, `CLAUDE.md`,
+  `.github/copilot-instructions.md`, or `docs/engineering/skills/`.
+
+## Checks
+
+- Changed public behavior has a durable documentation surface when needed.
+- Program metadata remains consistent with variables, parameters, and tested
+  year coverage.
+- Dataset behavior docs distinguish manifest selection, remote materialization,
+  and UK H5 loading.
+- Generated docs are refreshed when source changes require it.
+- Changelog fragments describe the change clearly and use the correct
+  Towncrier type.
+- PR descriptions list commands run and any skipped tests or known gaps.
+
+## Commands
+
+Run formatting before committing:
+
+```bash
+make format
+```
+
+Run documentation build checks when docs behavior changes materially:
+
+```bash
+make documentation
+```
+
+If a documentation command is not run because the change is guidance-only or the
+local environment cannot build the docs, state that explicitly.
diff --git a/docs/engineering/skills/github-prs.md b/docs/engineering/skills/github-prs.md
@@ -0,0 +1,81 @@
+# GitHub Pull Requests
+
+Use this skill before opening, replacing, updating, or sharing a pull request.
+
+## Branch And Issue Flow
+
+- Branch from an up-to-date `main`.
+- Open or identify a GitHub issue for the work before opening a PR.
+- Put `Fixes #ISSUE_NUMBER` as the first line of the PR description.
+- Open PRs as drafts unless a maintainer explicitly asks for a ready-for-review
+  PR.
+- Do not add `[codex]`, `[claude]`, `[copilot]`, or other agent labels to PR
+  titles. Use a plain descriptive title.
+
+## Required Changelog Fragment
+
+Every PR must include a Towncrier changelog fragment under `changelog.d/`.
+
+Preferred naming:
+
+```text
+changelog.d/<issue-number>.<type>.md
+```
+
+Use one of the configured Towncrier types from `pyproject.toml`:
+
+- `breaking`
+- `added`
+- `changed`
+- `fixed`
+- `removed`
+
+Examples:
+
+```text
+changelog.d/1748.added.md
+changelog.d/1748.fixed.md
+```
+
+Write the fragment as a short bullet describing the user-visible or maintainer
+visible change:
+
+```markdown
+- Added model-agnostic AI engineering guidance for UK model work.
+```
+
+The versioning workflow runs on pushes to `main` that touch `changelog.d/**` or
+`pyproject.toml`. It runs `.github/bump_version.py`, then `towncrier build`.
+Do not manually edit `CHANGELOG.md` or bump `pyproject.toml` in an ordinary PR
+unless the user or maintainer explicitly asks. The type controls the inferred
+version bump: `breaking` is major, `added` and `removed` are minor, and other
+types are patch.
+
+## Required Lint And Format
+
+Run lint and formatting before committing. The default command is:
+
+```bash
+make format
+```
+
+`make format` runs `ruff format .` and `ruff check .`.
+
+If the full command is blocked by local environment constraints, run the exact
+equivalent on the touched files and state that exception in the PR or handoff:
+
+```bash
+ruff format <changed-files>
+ruff check <changed-files>
+```
+
+Do not knowingly commit unformatted code or known lint failures.
+
+## Test Expectations
+
+Run focused tests that cover the changed behavior. Use `make test` for broad
+model changes, shared behavior, or changes where a narrower command would miss
+important regressions. See `testing.md` for details.
+
+When tests are skipped because they are slow, data-dependent, or require
+credentials, say so explicitly in the PR description or handoff.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		- Added a model-agnostic AI engineering harness that routes tool-specific agent guidance into shared docs.