diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 000000000..c93d81860 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,20 @@ +# Copilot Instructions + +Follow the repository's canonical AI-facing engineering guidance in `AGENTS.md` +and `docs/engineering/skills/`. + +For pull requests, read `docs/engineering/skills/github-prs.md` before opening, +replacing, or sharing a PR. Every PR must include a changelog fragment, and lint +plus formatting must run before committing. + +For tests, read `docs/engineering/skills/testing.md` before adding, moving, or +reviewing test files. + +For UK model formulas, parameters, reforms, enums, and program metadata, read +`docs/engineering/skills/model-structure.md`. + +For dataset source handling, local H5 files, default datasets, and upstream +materialization behavior, read `docs/engineering/skills/dataset-sources.md`. + +For documentation-sensitive changes, read +`docs/engineering/skills/documentation-review.md`. diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..ec6fb2726 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,39 @@ +# Agent Instructions + +These instructions apply repository-wide. + +## Canonical Guidance + +Canonical AI-facing engineering guidance lives under +`docs/engineering/skills/`. Use those files as the source of truth across +Codex, Claude, Copilot, and other AI tools. + +Tool-specific instruction files such as `CLAUDE.md` and +`.github/copilot-instructions.md` should stay thin and point here instead of +duplicating repository rules. + +## Required Skill Lookup + +Before opening, replacing, or sharing a pull request, read +`docs/engineering/skills/github-prs.md`. + +When adding, moving, or reviewing tests, read +`docs/engineering/skills/testing.md`. + +When adding or changing variables, parameters, reforms, enums, or program +metadata, read `docs/engineering/skills/model-structure.md`. + +When changing simulation dataset loading, default datasets, H5 inputs, or +interactions with upstream dataset materializers, read +`docs/engineering/skills/dataset-sources.md`. + +When changing public behavior, documentation, metadata surfaces, or generated +documentation, read `docs/engineering/skills/documentation-review.md`. + +## Non-Negotiable PR Requirements + +Every pull request must include a Towncrier changelog fragment under +`changelog.d/`. See `docs/engineering/skills/github-prs.md` for the naming and +type rules. + +Run lint and formatting before committing. The default command is `make format`. diff --git a/CLAUDE.md b/CLAUDE.md index c7bf8cea7..d8438f402 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,77 +1,39 @@ -# Development Guidelines for PolicyEngine UK +# Claude Instructions -## Git workflow -- **Always branch from `main`**: `git checkout main && git pull origin main && git checkout -b your-branch` -- Default branch is `main` (not `master`) +These instructions apply repository-wide. -## Build commands -- Install: `make install` or `pip install -e ".[dev]" --config-settings editable_mode=compat` -- Format code: `make format` or `ruff format .` -- Run all tests: `make test` -- Run single test: `pytest policyengine_uk/tests/path/to/test_file.py::test_function -v` -- Generate documentation: `make documentation` -- Update changelog: `make changelog` +## Canonical Guidance -## Code Standards -- **Formatting**: Use Ruff formatter with 88-character line length -- **Imports**: Group imports by stdlib, third-party, local with each group alphabetized -- **Naming**: Use snake_case for variables/functions, CamelCase for classes -- **Type Hints**: Use Python type hints where possible -- **Error Handling**: Handle exceptions gracefully with appropriate error messages -- **Testing**: Write tests for all new functionality -- **Documentation**: Document all public functions with docstrings -- **Versioning**: Follow SemVer (increment patch for fixes, minor for features, major for breaking changes) -- **Pull Requests**: Must include a changelog.d/ file describing the changes (GitHub Actions will automatically run `make changelog` to update the changelog) +Repository-wide AI-facing engineering guidance lives in `AGENTS.md`. +Canonical skills live under `docs/engineering/skills/`. -## Program registry (programs.yaml) -- `policyengine_uk/programs.yaml` is the single source of truth for program coverage metadata -- Served via the `/uk/metadata` API and consumed by the model coverage page -- **When adding a new program**: add an entry with `id`, `name`, `full_name`, `category`, `agency`, `status`, `coverage`, `variable`, `parameter_prefix` -- **When extending year coverage**: update `verified_years` after verifying parameters and tests cover the new year -- **When adding devolved programs**: use the appropriate agency (e.g., `Revenue Scotland`, `Social Security Scotland`) -- **Status values**: `complete`, `partial`, `in_progress` +Use those files as the source of truth. This file is a Claude adapter and should +stay thin; do not duplicate detailed testing, CI, formatting, pull request, +dataset, or model-structure rules here. -## Repository Structure -- **parameters/**: YAML files that define tax rates, thresholds, and other policy parameters - - Organized by government department (e.g., gov/hmrc/, gov/dwp/, gov/dfe/) - - Parameter labels should be lowercase and concise - - Each parameter file should include relevant legislative references -- **variables/**: Python files that define the formulas for calculating tax and benefit amounts - - Organized to match the same structure as parameters/ - - Comments should include relevant regulatory citations and calculation logic -- **tests/**: Test cases for validating correct implementation of policies -- For government departments, use the correct department for the policy (e.g., education policies under DfE, not DWP) +## Required Skill Lookup -## Lessons Learned from Variable Refactoring (2025-05-31) +Before opening, replacing, or sharing a PR, read +`docs/engineering/skills/github-prs.md`. -### Issue: Refactoring Script Bug -A refactoring script was used to split multi-variable Python files into single-variable files. The script had a systematic bug: when a file contained multiple Variable classes and one had the same name as the file, that variable was dropped entirely. +When adding, moving, or reviewing tests, read +`docs/engineering/skills/testing.md`. -### Variables Dropped During Refactoring -21 variables were dropped, including: -- Wrapper variables: `jsa_income`, `esa_income`, `jsa_contrib`, `esa_contrib`, `afcs`, `bsp`, `iidb` -- Calculated variables: `benefit_cap`, `carers_allowance`, `income_support`, `maternity_allowance`, `sda`, `tax_credits` -- Core variables: `child_benefit`, `allowances`, `marriage_allowance`, `stamp_duty_land_tax`, `vat`, `land_transaction_tax`, `attendance_allowance`, `council_tax_benefit`, `business_rates`, `tax`, `total_wealth`, `private_school_vat`, `bi_phaseout` +When modifying variables, parameters, reforms, enums, program metadata, or UK +model structure, read `docs/engineering/skills/model-structure.md`. -### Enum Recovery Errors -During recovery, enums were recreated based on assumptions rather than checking original code, leading to incorrect values: -- **TenureType**: Missing `OWNED_OUTRIGHT` and `OWNED_WITH_MORTGAGE` (consolidated into `OWNER_OCCUPIED`) -- **AccommodationType**: Wrong labels (e.g., "House - detached" vs "Detached house") -- **EducationType**: Wrong capitalization ("Lower secondary" vs "Lower Secondary") -- **FamilyType**: Shortened labels (missing ", with children" suffix) -- **EmploymentStatus**: Complete restructure (missing FT_/PT_ prefixes) -- **MinimumWageCategory**: Wrong format ("18-20" vs "18 to 20", "Over 24" vs "25 or over") -- **CouncilTaxBand**: Added incorrect "Band " prefix -- **StatePensionType**: Wrong capitalization ("Basic" vs "basic") +When modifying simulation dataset loading, default datasets, H5 inputs, or +upstream materialized dataset paths, read +`docs/engineering/skills/dataset-sources.md`. -### Best Practices for Refactoring Recovery -1. **Always check original code** - Never rely on assumptions or context clues -2. **Use git history systematically** - `git show :path/to/file` to see original content -3. **Verify enum values exactly** - Even small differences in labels can break functionality -4. **Test incrementally** - Run tests after each fix to ensure progress -5. **Document the recovery process** - Track what was fixed for future reference +When changing public behavior, metadata, docs, or generated docs, read +`docs/engineering/skills/documentation-review.md`. -### OpenFisca-Specific Patterns -- Use `.possible_values` to access enum values, not direct imports -- Import helper functions like `find_freeze_start` when needed -- Functions like `ceil` should be `np.ceil` in OpenFisca context \ No newline at end of file +## Safety Boundaries + +Do not fabricate policy sources, validation metrics, or performance claims. If a +result has not been computed from code or cited from a source, mark it as +pending. + +Do not upload, promote, publish, or overwrite data artifacts unless the user +explicitly asks for that operation. diff --git a/changelog.d/1748.added.md b/changelog.d/1748.added.md new file mode 100644 index 000000000..a862690eb --- /dev/null +++ b/changelog.d/1748.added.md @@ -0,0 +1 @@ +- Added a model-agnostic AI engineering harness that routes tool-specific agent guidance into shared docs. diff --git a/docs/engineering/skills/README.md b/docs/engineering/skills/README.md new file mode 100644 index 000000000..4e24b6eea --- /dev/null +++ b/docs/engineering/skills/README.md @@ -0,0 +1,24 @@ +# Engineering Skills + +This directory is the canonical source for AI-facing engineering rules. + +Tool-specific instruction files such as `AGENTS.md`, `CLAUDE.md`, and +`.github/copilot-instructions.md` should point here instead of duplicating +implementation-specific guidance. When a rule changes, update the skill here +first, then keep adapters thin. + +Current skills: + +- `dataset-sources.md`: default datasets, HuggingFace dataset URLs, local H5 + file paths, and upstream dataset materialization behavior. +- `documentation-review.md`: model-neutral review checks for public behavior, + docs, metadata, and generated documentation. +- `github-prs.md`: issue-first PR workflow, required Towncrier changelog + fragments, draft PR expectations, and pre-commit lint/format rules. +- `model-structure.md`: UK variables, parameters, reforms, enums, program + registry metadata, and refactoring recovery lessons. +- `testing.md`: UK test layout, focused test selection, slow microsimulation + boundaries, and expected commands. + +Use the narrowest skill that matches the task. When a task spans multiple +areas, read each relevant skill before editing. diff --git a/docs/engineering/skills/dataset-sources.md b/docs/engineering/skills/dataset-sources.md new file mode 100644 index 000000000..df36813c5 --- /dev/null +++ b/docs/engineering/skills/dataset-sources.md @@ -0,0 +1,57 @@ +# Dataset Sources + +Use this skill when changing simulation dataset loading, default datasets, H5 +inputs, or interactions with upstream dataset materializers. + +## Supported Source Shapes + +`policyengine_uk.Simulation` and `Microsimulation` support several data inputs: + +- explicit situations for single-household calculator-style simulations; +- pandas DataFrames; +- `policyengine_core.data.Dataset` instances; +- UK single-year and multi-year dataset schema objects; +- HuggingFace dataset URLs using `hf://...`; +- local H5 file paths that have already been materialized by an upstream + resolver. + +Do not treat every string as a remote URL. Local file paths are valid dataset +sources when an upstream package has downloaded or materialized a remote dataset +to disk. + +## Remote Datasets + +UK directly supports `hf://` dataset URLs through the existing HuggingFace +download path. + +Other remote schemes, including `gs://`, should be materialized before being +passed into UK unless UK has explicit scheme support. The `.py` bundle/runtime +can resolve a manifest entry, materialize a remote dataset source, and then pass +the resulting local H5 path into `policyengine_uk`. + +When debugging dataset failures, distinguish these stages: + +1. manifest or caller chooses the dataset reference; +2. upstream resolver materializes remote references such as GCS to a local path; +3. UK loads the local H5 path and validates whether it is a single-year, + multi-year, or core dataset file. + +Errors at those stages require different fixes. + +## Defaults + +`policyengine_uk` does not silently choose a default dataset when no situation is +provided. Callers must pass `dataset=...` or opt into a default through the +configured environment variable. + +Keep default dataset behavior explicit so calculator-style simulations, +microsimulations, and API workers do not accidentally diverge. + +## Tests + +For dataset source routing, prefer stubbed tests that prove dispatch behavior +without downloading or opening private survey files. + +Use real H5 files only when the test specifically needs schema or data behavior. +If the artifact is private, large, or credential-dependent, skip cleanly or keep +the verification outside the ordinary PR test suite. diff --git a/docs/engineering/skills/documentation-review.md b/docs/engineering/skills/documentation-review.md new file mode 100644 index 000000000..77daf1a13 --- /dev/null +++ b/docs/engineering/skills/documentation-review.md @@ -0,0 +1,55 @@ +# Documentation Review + +Use this skill when reviewing changes to public behavior, metadata surfaces, +documentation, generated docs, or repository guidance. + +## Review Goal + +Documentation review is a harness check, not copyediting. Confirm that durable +documentation still describes the code paths a maintainer or AI agent would use +to understand, validate, or modify the model. + +Keep durable facts in source documentation. Put PR-specific confidence, impact, +or reviewer notes in the PR description or review comments instead. + +## Trigger Areas + +Run documentation review when a PR changes: + +- public simulation behavior; +- variable or parameter behavior that changes user-visible policy results; +- `policyengine_uk/programs.yaml`; +- dataset source handling or default dataset behavior; +- generated or published docs under `docs/`; +- API-facing metadata consumed by downstream applications; +- AI-facing engineering guidance under `AGENTS.md`, `CLAUDE.md`, + `.github/copilot-instructions.md`, or `docs/engineering/skills/`. + +## Checks + +- Changed public behavior has a durable documentation surface when needed. +- Program metadata remains consistent with variables, parameters, and tested + year coverage. +- Dataset behavior docs distinguish manifest selection, remote materialization, + and UK H5 loading. +- Generated docs are refreshed when source changes require it. +- Changelog fragments describe the change clearly and use the correct + Towncrier type. +- PR descriptions list commands run and any skipped tests or known gaps. + +## Commands + +Run formatting before committing: + +```bash +make format +``` + +Run documentation build checks when docs behavior changes materially: + +```bash +make documentation +``` + +If a documentation command is not run because the change is guidance-only or the +local environment cannot build the docs, state that explicitly. diff --git a/docs/engineering/skills/github-prs.md b/docs/engineering/skills/github-prs.md new file mode 100644 index 000000000..2a4a0ce90 --- /dev/null +++ b/docs/engineering/skills/github-prs.md @@ -0,0 +1,81 @@ +# GitHub Pull Requests + +Use this skill before opening, replacing, updating, or sharing a pull request. + +## Branch And Issue Flow + +- Branch from an up-to-date `main`. +- Open or identify a GitHub issue for the work before opening a PR. +- Put `Fixes #ISSUE_NUMBER` as the first line of the PR description. +- Open PRs as drafts unless a maintainer explicitly asks for a ready-for-review + PR. +- Do not add `[codex]`, `[claude]`, `[copilot]`, or other agent labels to PR + titles. Use a plain descriptive title. + +## Required Changelog Fragment + +Every PR must include a Towncrier changelog fragment under `changelog.d/`. + +Preferred naming: + +```text +changelog.d/..md +``` + +Use one of the configured Towncrier types from `pyproject.toml`: + +- `breaking` +- `added` +- `changed` +- `fixed` +- `removed` + +Examples: + +```text +changelog.d/1748.added.md +changelog.d/1748.fixed.md +``` + +Write the fragment as a short bullet describing the user-visible or maintainer +visible change: + +```markdown +- Added model-agnostic AI engineering guidance for UK model work. +``` + +The versioning workflow runs on pushes to `main` that touch `changelog.d/**` or +`pyproject.toml`. It runs `.github/bump_version.py`, then `towncrier build`. +Do not manually edit `CHANGELOG.md` or bump `pyproject.toml` in an ordinary PR +unless the user or maintainer explicitly asks. The type controls the inferred +version bump: `breaking` is major, `added` and `removed` are minor, and other +types are patch. + +## Required Lint And Format + +Run lint and formatting before committing. The default command is: + +```bash +make format +``` + +`make format` runs `ruff format .` and `ruff check .`. + +If the full command is blocked by local environment constraints, run the exact +equivalent on the touched files and state that exception in the PR or handoff: + +```bash +ruff format +ruff check +``` + +Do not knowingly commit unformatted code or known lint failures. + +## Test Expectations + +Run focused tests that cover the changed behavior. Use `make test` for broad +model changes, shared behavior, or changes where a narrower command would miss +important regressions. See `testing.md` for details. + +When tests are skipped because they are slow, data-dependent, or require +credentials, say so explicitly in the PR description or handoff. diff --git a/docs/engineering/skills/model-structure.md b/docs/engineering/skills/model-structure.md new file mode 100644 index 000000000..0ec4753ca --- /dev/null +++ b/docs/engineering/skills/model-structure.md @@ -0,0 +1,73 @@ +# UK Model Structure + +Use this skill when adding or changing variables, parameters, reforms, enums, +program metadata, or model structure. + +## Core Layout + +- Parameters live under `policyengine_uk/parameters/`, usually organized by + government department such as `gov/hmrc/`, `gov/dwp/`, or `gov/dfe/`. +- Variables live under `policyengine_uk/variables/`, following the same policy + structure as parameters where practical. +- Policy YAML tests live under `policyengine_uk/tests/policy/`. +- Program coverage metadata lives in `policyengine_uk/programs.yaml`. + +For government departments, use the correct department for the policy. For +devolved programs, use the appropriate devolved agency path, such as +`gov/social_security_scotland/`, `gov/revenue_scotland/`, `gov/senedd/`, or +`gov/welsh_revenue_authority/`. + +## Variables And Parameters + +- Match the variable file name to the class name, for example `pip_dl.py` + defines `class pip_dl(Variable)`. +- Use vectorized formula helpers such as `where(...)`, `max_(...)`, and + `min_(...)`; do not use Python scalar `if`, `max`, or `min` inside formulas. +- Use `np.ceil` and other NumPy functions where OpenFisca formulas need array + behavior. +- Cite durable policy sources in `reference` fields when available. +- Parameter labels should be concise, lowercase, and stable. +- Add or update focused tests alongside variables and parameters. + +## Program Registry + +`policyengine_uk/programs.yaml` is the single source of truth for UK program +coverage metadata. It is served through the `/uk/metadata` API and consumed by +model coverage surfaces. + +When adding a program, include: + +- `id` +- `name` +- `full_name` +- `category` +- `agency` +- `status` +- `coverage` +- `variable` +- `parameter_prefix` + +When extending year coverage, update `verified_years` only after verifying that +parameters and tests cover the new year. Status values include `complete`, +`partial`, and `in_progress`. + +## Refactoring Recovery Lessons + +During the 2025-05-31 variable refactor recovery, a script that split +multi-variable Python files into single-variable files dropped classes when one +class had the same name as the original file. Recovery was made harder when enum +values were recreated from assumptions instead of source history. + +When recovering or refactoring model code: + +1. Check original code with `git show :path/to/file`. +2. Verify enum values exactly, including spelling, capitalization, and labels. +3. Use `.possible_values` to access enum values rather than importing enum + classes directly when formula context expects possible values. +4. Test incrementally after each fix. +5. Document recovery steps when they affect durable model behavior. + +Known enum recovery pitfalls included `TenureType`, `AccommodationType`, +`EducationType`, `FamilyType`, `EmploymentStatus`, `MinimumWageCategory`, +`CouncilTaxBand`, and `StatePensionType`. Treat enum labels as compatibility +surface, not cosmetic text. diff --git a/docs/engineering/skills/testing.md b/docs/engineering/skills/testing.md new file mode 100644 index 000000000..d2dc56b41 --- /dev/null +++ b/docs/engineering/skills/testing.md @@ -0,0 +1,60 @@ +# Testing + +Use this skill whenever adding, moving, or reviewing tests. + +## Test Layout + +- Policy YAML tests live under `policyengine_uk/tests/policy/` and are run with + `policyengine-core test`. +- Python tests live under `policyengine_uk/tests/`. +- Full microsimulation tests live under `policyengine_uk/tests/microsimulation/` + and should use the `microsimulation` marker when they exercise expensive + survey-wide behavior. + +## Choosing The Right Test + +- For a variable formula or parameter rule, prefer a small YAML policy test + first. +- For Python helper behavior, routing logic, error handling, or regression tests + that do not need the full dataset, use focused Python tests. +- For simulation behavior that depends on survey-wide data, add the smallest + microsimulation test that proves the behavior and mark it appropriately. +- Avoid adding slow dataset-dependent tests when a stubbed or fixture-scale test + proves the same contract. +- Do not use real network credentials, HuggingFace downloads, GCS downloads, or + private H5 files in unit-style tests. Mock those seams or skip cleanly when the + external artifact is unavailable. + +## Commands + +Run lint and formatting before committing: + +```bash +make format +``` + +Run all tests when the change has broad model risk: + +```bash +make test +``` + +Run a focused Python test: + +```bash +uv run pytest policyengine_uk/tests/path/to/test_file.py::test_name -v +``` + +Run a focused policy YAML test: + +```bash +uv run policyengine-core test policyengine_uk/tests/policy/path/to/test.yaml -c policyengine_uk +``` + +Run microsimulation tests explicitly: + +```bash +make test-microsimulation +``` + +When a command is not run, record why.