diff --git a/.github/agents/GPT5.0-Beastmode1.0.agent.md b/.github/agents/Beastmode.agent.md
similarity index 93%
rename from .github/agents/GPT5.0-Beastmode1.0.agent.md
rename to .github/agents/Beastmode.agent.md
index e01da3f..979c998 100644
--- a/.github/agents/GPT5.0-Beastmode1.0.agent.md
+++ b/.github/agents/Beastmode.agent.md
@@ -1,9 +1,8 @@
---
-description: Beast Mode for GPT-5
-model: GPT-5
+description: Beast Mode (Generic) to encourage the agent to keep going until the user's query is completely resolved.
---
-# GPT-5 Beast Mode
+# Beast Mode
You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user.
@@ -111,7 +110,7 @@ Whenever you need to use, recommend, or implement a third-party API, dependency,
## Memory
-You have a memory that stores information about the user and their preferences. This memory is used to provide a more personalized experience. You can access and update this memory as needed. The memory is stored in a file called `.github/instructions/memory.instructions.md`. If the file is empty, you'll need to create it.
+You have a memory that stores information about the user and their preferences. This memory is used to provide a more personalised experience. You can access and update this memory as needed. The memory is stored in a file called `.github/instructions/memory.instructions.md`. If the file is empty, you'll need to create it.
When creating a new memory file, you MUST include the following front matter at the top of the file:
@@ -123,19 +122,19 @@ applyTo: '**'
If the user asks you to remember something or add something to your memory, you can do so by updating the memory file.
If you think that you need to remember a fact for later, add that to the memory file as well.
-Be judicious about what you choose to add to your memory knowing that this takes time and also reduces the size of the context window.
+Be judicious about what you choose to add to the memory file knowing that this takes time and also reduces the size of the context window.
## Writing Prompts
-If you are asked to write a prompt, you should always generate the prompt in markdown format.
+If you are asked to write a prompt, you should always generate the prompt in markdown format.
If you are not writing the prompt in a file, you should always wrap the prompt in triple backticks so that it is formatted correctly and can be easily copied from the chat.
## Git
-If the user tells you to stage and commit, you may do so.
-You are NEVER allowed to stage and commit files, or to attempt to push to a remote.
+If the user explicitly tells you to stage and commit, you may do so.
+You are NEVER allowed to push to a remote.
-## Summarize
+## Summarise
-If the user tells you to summarize, they want you to summarize the chat history and place it in the memory file. You want to be as concise as possible here. You may use a format that only you can understand if it helps reduce the size that the memory file takes up.
+If the user tells you to summarise, they want you to summarise the chat history and place it in the memory file. You want to be as concise as possible here. You may use a format that only you can understand if it helps reduce the size that the memory file takes up.
diff --git a/.github/agents/GPT4.1-Beastmode3.1.agent.md b/.github/agents/GPT4.1-Beastmode3.1.agent.md
deleted file mode 100644
index cff7ba0..0000000
--- a/.github/agents/GPT4.1-Beastmode3.1.agent.md
+++ /dev/null
@@ -1,132 +0,0 @@
----
-description: Beast Mode for GPT-4.1
-model: GPT-4.1
----
-
-# 4.1 Beast Mode (VS Code v1.102)
-
-You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user.
-
-Your thinking should be thorough and so it's fine if it's very long. However, avoid unnecessary repetition and verbosity. You should be concise, but thorough.
-
-You MUST iterate and keep going until the problem is solved.
-
-You have everything you need to resolve this problem. I want you to fully solve this autonomously before coming back to me.
-
-Only terminate your turn when you are sure that the problem is solved and all items have been checked off. Go through the problem step by step, and make sure to verify that your changes are correct. NEVER end your turn without having truly and completely solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
-
-THE PROBLEM CAN NOT BE SOLVED WITHOUT EXTENSIVE INTERNET RESEARCH.
-
-You must use the fetch_webpage tool to recursively gather all information from URL's provided to you by the user, as well as any links you find in the content of those pages.
-
-Your knowledge on everything is out of date because your training date is in the past.
-
-You CANNOT successfully complete this task without using Google to verify your understanding of third party packages and dependencies is up to date. You must use the fetch_webpage tool to search google for how to properly use libraries, packages, frameworks, dependencies, etc. every single time you install or implement one. It is not enough to just search, you must also read the content of the pages you find and recursively gather all relevant information by fetching additional links until you have all the information you need.
-
-Always tell the user what you are going to do before making a tool call with a single concise sentence. This will help them understand what you are doing and why.
-
-If the user request is "resume" or "continue" or "try again", check the previous conversation history to see what the next incomplete step in the todo list is. Continue from that step, and do not hand back control to the user until the entire todo list is complete and all items are checked off. Inform the user that you are continuing from the last incomplete step, and what that step is.
-
-Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Use the sequential thinking tool if available. Your solution must be perfect. If not, continue working on it. At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.
-
-You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
-
-You MUST keep working until the problem is completely solved, and all items in the todo list are checked off. Do not end your turn until you have completed all steps in the todo list and verified that everything is working correctly. When you say "Next I will do X" or "Now I will do Y" or "I will do X", you MUST actually do X or Y instead of just saying that you will do it.
-
-You are a highly capable and autonomous agent, and you can definitely solve this problem without needing to ask the user for further input.
-
-## Workflow
-
-1. Fetch any URL's provided by the user using the `fetch_webpage` tool.
-2. Understand the problem deeply. Carefully read the issue and think critically about what is required. Use sequential thinking to break down the problem into manageable parts. Consider the following:
- - What is the expected behavior?
- - What are the edge cases?
- - What are the potential pitfalls?
- - How does this fit into the larger context of the codebase?
- - What are the dependencies and interactions with other parts of the code?
-3. Investigate the codebase. Explore relevant files, search for key functions, and gather context.
-4. Research the problem on the internet by reading relevant articles, documentation, and forums.
-5. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps. Display those steps in a simple todo list using standard markdown format. Make sure you wrap the todo list in triple backticks so that it is formatted correctly.
-6. Implement the fix incrementally. Make small, testable code changes.
-7. Debug as needed. Use debugging techniques to isolate and resolve issues.
-8. Test frequently. Run tests after each change to verify correctness.
-9. Iterate until the root cause is fixed and all tests pass.
-10. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness, and remember there are hidden tests that must also pass before the solution is truly complete.
-
-Refer to the detailed sections below for more information on each step.
-
-### 1. Fetch Provided URLs
-
-- If the user provides a URL, use the `functions.fetch_webpage` tool to retrieve the content of the provided URL.
-- After fetching, review the content returned by the fetch tool.
-- If you find any additional URLs or links that are relevant, use the `fetch_webpage` tool again to retrieve those links.
-- Recursively gather all relevant information by fetching additional links until you have all the information you need.
-
-### 2. Deeply Understand the Problem
-
-Carefully read the issue and think hard about a plan to solve it before coding.
-
-### 3. Codebase Investigation
-
-- Explore relevant files and directories.
-- Search for key functions, classes, or variables related to the issue.
-- Read and understand relevant code snippets.
-- Identify the root cause of the problem.
-- Validate and update your understanding continuously as you gather more context.
-
-### 4. Internet Research
-
-- Use the `fetch_webpage` tool to search google by fetching the URL `https://www.google.com/search?q=your+search+query`.
-- After fetching, review the content returned by the fetch tool.
-- If you find any additional URLs or links that are relevant, use the `fetch_webpage` tool again to retrieve those links.
-- Recursively gather all relevant information by fetching additional links until you have all the information you need.
-
-### 5. Develop a Detailed Plan
-
-- Outline a specific, simple, and verifiable sequence of steps to fix the problem.
-- Create a todo list in markdown format to track your progress.
-- Each time you complete a step, check it off using `[x]` syntax.
-- Each time you check off a step, display the updated todo list to the user.
-- Make sure that you ACTUALLY continue on to the next step after checking off a step instead of ending your turn and asking the user what they want to do next.
-
-### 6. Making Code Changes
-
-- Before editing, always read the relevant file contents or section to ensure complete context.
-- Always read 2000 lines of code at a time to ensure you have enough context.
-- If a patch is not applied correctly, attempt to reapply it.
-- Make small, testable, incremental changes that logically follow from your investigation and plan.
-
-### 7. Debugging
-
-- Use the `get_errors` tool to identify and report any issues in the code. This tool replaces the previously used `#problems` tool.
-- Make code changes only if you have high confidence they can solve the problem
-- When debugging, try to determine the root cause rather than addressing symptoms
-- Debug for as long as needed to identify the root cause and identify a fix
-- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening
-- To test hypotheses, you can also add test statements or functions
-- Revisit your assumptions if unexpected behavior occurs.
-
-## How to create a Todo List
-
-Use the following format to create a todo list:
-
-```markdown
-- [ ] Step 1: Description of the first step
-- [ ] Step 2: Description of the second step
-- [ ] Step 3: Description of the third step
-```
-
-Do not ever use HTML tags or any other formatting for the todo list, as it will not be rendered correctly. Always use the markdown format shown above.
-
-## Communication Guidelines
-
-Always communicate clearly and concisely in a casual, friendly yet professional tone.
-
-
-"Let me fetch the URL you provided to gather more information."
-"Ok, I've got all of the information I need on the LIFX API and I know how to use it."
-"Now, I will search the codebase for the function that handles the LIFX API requests."
-"I need to update several files here - stand by"
-"OK! Now let's run the tests to make sure everything is working correctly."
-"Whelp - I see we have some problems. Let's fix those up."
-
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 1a5007a..897e256 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -14,6 +14,8 @@ Priority: data protection integrity (immutability, retention correctness, tag se
Out of scope: application code backup, speculative destructive operations, compliance mode toggling without explicit approval.
+This module is consumed by the `code-ark` repository. See code-ark's `docs/blueprint-ownership.md` for the full ownership boundary between the two repositories.
+
## Architecture & Key Patterns
### Two-Vault Pattern
@@ -22,58 +24,15 @@ Source vault (`aws_backup_vault.main` in source module) + destination replicated
**Critical**: `examples/destination/aws-backups.tf` must be applied first to generate `destination_vault_arn` consumed by source configuration via `TF_VAR_destination_vault_arn`.
-### Tag-Based Resource Selection
-
-Resources included in backups via tags:
-
-- **General**: `NHSE-Enable-Backup = True` (case-sensitive) for S3/RDS/EBS
-- **Service-specific**: `BackupDynamoDB`, `BackupAurora`, `BackupParameterStore` (tag keys defined in `variables.tf` `backup_plan_config_*` blocks)
-
-Selection logic in `backup_plan.tf` uses `aws_backup_selection` resources with `selection_by_tags` blocks. Tag values default to `"True"` if `selection_tag_value` is null (see `locals.tf` null-checking pattern: `local.selection_tag_value_*_null_checked`).
-
-### Retention Lifecycle Design
-
-Example retention values in `examples/source/aws-backups.tf` are **intentionally short** (e.g., 35 days daily, 365 days destination copy). Production values require Information Asset Owner input. Each `backup_plan_config.rules` object requires:
-
-- `schedule`: EventBridge cron expression (e.g., `cron(0 1 ? * * *)`)
-- `lifecycle.delete_after`: days until recovery point deletion
-- `copy_action[].delete_after`: days until cross-account copy deletion (only if `backup_copy_vault_arn` set)
-
-**Important**: `copy_action` only created when both `backup_copy_vault_arn` and `backup_copy_vault_account_id` are non-empty (see `backup_plan.tf` dynamic block condition).
-
### Vault Lock Modes
-`modules/aws-backup-destination/backup_vault_lock.tf`:
-
-- **Governance** (default): vault deletable, policy editable, suitable for testing
-- **Compliance**: irreversible after `changeable_for_days` cooling period (3-36,500 days, default 14)
-
-**Never** auto-enable compliance mode. Require explicit user confirmation with cooling period acknowledgment. Variable guards: `enable_vault_protection` + `vault_lock_type = "compliance"`.
-
-### Lambda Function Patterns
-
-All Lambdas in `modules/aws-backup-source/resources/*/`:
-
-- **Runtime**: Python 3.12
-- **Testing**: `test_*.py` using `unittest.mock` for AWS client mocking
-- **Environment config**: Load from `os.environ` with `load_configuration()` pattern
-- **Error handling**: Log exceptions with `exc_info=True`, return `{"status": "FAILED", "error": ...}` dicts
-
-Key Lambdas:
-
-1. **parameter-store-backup** (`parameter_store_backup.py`): Discovers SSM parameters by tag, encrypts with KMS, stores JSON in S3. Handles pagination (`NextToken`).
-1. **copy-recovery-point** (`copy_recovery_point.py`): Cross-account recovery point copy with STS assume role, waits for `COMPLETED`/`FAILED` state with configurable `WAIT_DELAY_SECONDS`.
+**Never** auto-enable compliance mode. Require explicit user confirmation with cooling period acknowledgment. See the `vault-lock-safety` skill for full details.
-Test invocation: `python test_.py` (no pytest/coverage tools configured).
+## Agent Skills
-## AWS Profiles & Multi-Account Workflow
+Specialist knowledge is organised into on-demand skills under `.github/skills/`. See `.github/skills/README.md` for the full catalogue. Skills cover: backup plans & tag selections, Python Lambdas, and vault lock safety.
-**From `.github/instructions/memory.instructions.md`**:
-
-- Source account: `AWS_PROFILE=code-ark-dev-2` (723760173216)
-- Destination/vault account: `AWS_PROFILE=code-ark-vault-dev-2` (954869684612)
-
-Always use profile prefix for CLI commands in examples. Use `data.aws_caller_identity.current.account_id` and `data.aws_arn.destination_vault_arn.account` for dynamic account ID resolution (see `examples/source/aws-backups.tf` locals).
+Copilot loads skills automatically when relevant to a task. Always-on rules stay here; specialist detail lives in skills.
## Terraform Conventions
@@ -91,73 +50,13 @@ Always use profile prefix for CLI commands in examples. Use `data.aws_caller_ide
Destination must complete before source to avoid missing ARN references.
-## Python Lambda Guidelines
-
-Current Lambdas (under `modules/aws-backup-source/resources/`):
-
-- `parameter-store-backup/parameter_store_backup.py`
-- `restore-to-s3/restore_to_s3.py`
-
-Testing files exist. Maintain test runnable via `python test_.py`.
-
-Assumptions:
-
-- Python version: 3.12
-- Style: flake8
-
-Edge Cases to cover for new Lambda code:
-
-- Empty parameter sets / paginated SSM responses.
-- Large S3 object sets (paging, memory).
-- Retry transient AWS errors (throttling, network).
-
-### Lambda Testing Pattern
-
-From `test_parameter_store_backup.py`:
-
-```python
-@patch.dict(os.environ, {
- 'KMS_KEY_ARN': 'test-kms-key',
- 'PARAMETER_STORE_BUCKET_NAME': 'test-bucket',
- 'TAG_KEY': 'test-key',
- 'TAG_VALUE': 'test-value'
-})
-def test_load_configuration(self):
- config = load_configuration()
- self.assertEqual(config['kms_key_id'], 'test-kms-key')
-```
-
-- Mock AWS clients using `unittest.mock.MagicMock`
-- Patch environment variables with `@patch.dict(os.environ, {...})`
-- Test success path + at least one failure scenario
-- Run directly: `python test_.py` (no pytest dependency)
-
## Security & Compliance
-- Reference `SECURITY.md` for vulnerability reporting; never include PII or secrets in examples (reinforce PR template sensitive info checklist).
+- Reference `SECURITY.md` for vulnerability reporting; never include PII or secrets in examples.
- Treat retention choices as compliance-impacting; don't silently lengthen or shorten.
- Never downgrade vault lock or remove protection flags.
- Highlight irreversible actions (compliance mode enable) before performing.
-
-### IAM Policy Pattern
-
-From `modules/aws-backup-source/lambda_parameter_store_backup.tf`:
-
-```terraform
-statement {
- effect = "Allow"
- actions = [
- "ssm:DescribeParameters",
- "ssm:GetParametersByPath",
- "ssm:GetParameter",
- "ssm:GetParameters",
- "ssm:ListTagsForResource"
- ]
- resources = ["arn:aws:ssm:*:*:*"]
-}
-```
-
-Keep IAM actions minimal; wildcard resources only when service requires it (SSM parameters, KMS encrypt operations). Document any `resources = ["*"]` with justification comment.
+- Keep IAM actions minimal; wildcard resources only when service requires it. Document any `resources = ["*"]` with justification comment.
## Documentation & Comment Policy
@@ -165,28 +64,6 @@ Keep IAM actions minimal; wildcard resources only when service requires it (SSM
- Update affected README sections when adding capabilities (source vs destination vs service-specific notes).
- Cross-link new docs under `docs/` if design changes (e.g., restoration orchestration). Provide concise rationales.
-### Example Comment Standards
-
-**Avoid** (restates obvious):
-
-```terraform
-# Create backup plan
-resource "aws_backup_plan" "default" {
- name = "${local.resource_name_prefix}-plan"
-}
-```
-
-**Acceptable** (explains non-obvious logic):
-
-```terraform
-# copy_action only created when both vault ARN and account ID provided
-# to avoid incomplete cross-account configuration
-dynamic "copy_action" {
- for_each = var.backup_copy_vault_arn != "" && var.backup_copy_vault_account_id != "" && rule.value.copy_action != null ? rule.value.copy_action : {}
- content { ... }
-}
-```
-
## Testing Strategy
Existing automated tests: Lambda unit tests. Missing (future): integration tests for restore Step Function and backup lifecycle.
@@ -198,41 +75,6 @@ For changes:
1. Validate tag filtering logic via fixture sets.
1. For Terraform logic additions, prefer `terraform plan` reasoning rather than full apply (unless user asks for runtime validation).
-### Running Tests
-
-Lambda tests run directly with Python unittest:
-
-```bash
-cd modules/aws-backup-source/resources/parameter-store-backup/
-python test_parameter_store_backup.py
-```
-
-No pytest or coverage tools configured. Tests use `unittest.mock` extensively for AWS service mocking.
-
-## Interaction Patterns (Prompt Recipes)
-
-Use these templates when asking Copilot for help or when Copilot prepares actions:
-
-### a. Add New Service Backup Support
-
-"Add AWS Backup support for \. Provide new selection tag, variables, docs update, and test scenario. Maintain existing plan structure and security posture."
-
-### b. Adjust Retention Policy
-
-"Modify daily rule retention from 35 to 60 days while keeping copy retention 365. Show diff only for affected rule."
-
-### c. Parameter Store Enhancement
-
-"Extend parameter store backup Lambda to skip parameters with key prefix 'SECRET_'. Update tests accordingly."
-
-### d. Restoration Orchestration
-
-"Draft Step Function definition (JSON/YAML) for cross-account copy then restore for DynamoDB including optional validation Lambda invocation."
-
-### e. Vault Lock Decision Aid
-
-"Summarise risks & pre-checklist before enabling compliance mode vault lock."
-
## Required Assistant Behaviours
- Always gather file context via read operations before edits.
@@ -260,10 +102,6 @@ Pull Request title format: `ENG- ` (may be prefixed w
These conventions should be enforced in examples and automation; assistants generating branches or PR titles must adhere unless user explicitly overrides.
-## Research References
-
-Limited external docs fetched (GitHub Copilot documentation landing pages). For deeper feature additions: consult AWS Backup docs (vault locks, logically air-gapped vaults, restore testing) and Terraform Provider AWS registry.
-
## Quick Checklist Before Acting
1. Read relevant module file(s).
@@ -274,14 +112,3 @@ Limited external docs fetched (GitHub Copilot documentation landing pages). For
1. Minimise commentary.
1. Present diff + validation plan.
-## Example Confirmation Prompt Before Compliance Mode
-
-"User confirmation required: Enabling compliance mode (changeable_for_days=X) makes vault lock irreversible after cooling period. Proceed? (Yes/No)"
-
-## Tone & Communication
-
-- Friendly, concise, purposeful.
-- Avoid filler acknowledgements.
-- Surface assumptions explicitly.
-- Highlight risk boundaries clearly.
-
diff --git a/.github/skills/README.md b/.github/skills/README.md
new file mode 100644
index 0000000..ab2ce87
--- /dev/null
+++ b/.github/skills/README.md
@@ -0,0 +1,19 @@
+# Agent Skills
+
+This directory contains [VS Code Agent Skills](https://code.visualstudio.com/docs/copilot/customization/agent-skills) — folders of instructions, scripts, and resources that GitHub Copilot loads on-demand when relevant to a task.
+
+Skills complement the always-on rules in `.github/copilot-instructions.md`. Copilot-instructions contain generic guidance relevant to every task. Skills contain specialist knowledge loaded only when needed, keeping the agent's context focused.
+
+## Skills in This Repository
+
+| Skill | Directory | When Copilot Loads It |
+| ------- | ----------- | ---------------------- |
+| **Backup Plans & Selections** | `backup-plans-selections/` | Adding or modifying backup plans, tag-based selections, retention lifecycle, or compliance frameworks |
+| **Python Lambdas** | `python-lambdas/` | Working on Python Lambda code under `modules/aws-backup-source/resources/` |
+| **Vault Lock Safety** | `vault-lock-safety/` | Any mention of vault lock, compliance mode, or immutability settings |
+
+## Related Documentation
+
+- [Copilot Instructions](../copilot-instructions.md) — always-on agent rules
+- [Covered Services](../../COVERED_SERVICES.md) — which AWS services are supported
+- [Changelog](../../CHANGELOG.md) — module version history
diff --git a/.github/skills/backup-plans-selections/SKILL.md b/.github/skills/backup-plans-selections/SKILL.md
new file mode 100644
index 0000000..d246975
--- /dev/null
+++ b/.github/skills/backup-plans-selections/SKILL.md
@@ -0,0 +1,91 @@
+# Backup Plans & Selections
+
+Use this skill when the user adds or modifies backup plans, tag-based resource selections, retention lifecycle, or compliance framework checks.
+
+## Tag-Based Resource Selection
+
+Resources are included in backups via tags. Tags are **case-sensitive** — only `True` matches.
+
+| Service | Default Selection Tag | Default Value | Config Variable |
+| --------- | ---------------------- | --------------- | ----------------- |
+| S3/RDS (default plan) | `BackupLocal` | `True` | `backup_plan_config` |
+| DynamoDB | `BackupDynamoDB` | `True` | `backup_plan_config_dynamodb` |
+| EBS | `BackupEBSVol` | `True` | `backup_plan_config_ebsvol` |
+| Aurora | `BackupAurora` | `True` | `backup_plan_config_aurora` |
+| Parameter Store | `BackupParameterStore` | `True` | `backup_plan_config_parameter_store` |
+
+Consumers override `selection_tag` (e.g. to `NHSE-Enable-Backup`) via the config variable.
+
+### Null-Checking Pattern
+
+`locals.tf` provides null-checked tag values:
+
+```hcl
+local.selection_tag_value_*_null_checked
+```
+
+If `selection_tag_value` is null, it defaults to `"True"`.
+
+### Additional Selection Tags
+
+`selection_tags` (plural) provides fine-grained filtering (e.g. by environment). Each tag in the list creates an additional `selection_by_tags` block.
+
+## Backup Plan Structure
+
+Plans are in `backup_plan.tf`. Each plan has:
+
+- `rules` — list of objects with:
+ - `schedule`: EventBridge cron expression (e.g. `cron(0 1 ? * * *)`)
+ - `lifecycle.delete_after`: days until recovery point deletion
+ - `copy_action[].delete_after`: days until cross-account copy deletion
+ - Optional: `enable_continuous_backup` for PITR
+
+### Cross-Account Copy Action
+
+`copy_action` is only created when **both** conditions are met:
+
+```hcl
+var.backup_copy_vault_arn != "" && var.backup_copy_vault_account_id != ""
+```
+
+This prevents incomplete cross-account configuration.
+
+## Compliance Framework
+
+Each service has a compliance framework in `backup_framework.tf` checking:
+
+- Backup resources encrypted
+- Manual deletion disabled
+- Minimum retention ≥ 35 days
+- Backup frequency ≥ daily
+- Resources are protected
+- Last recovery point age within bounds
+
+**Note:** Example retention values are intentionally short (e.g. 2 days) and will **fail** compliance checks by design. Production values require Information Asset Owner input.
+
+## Retention Lifecycle
+
+- `lifecycle.delete_after` — local vault retention
+- `copy_action[].delete_after` — destination vault retention
+- Never silently lengthen or shorten retention — treat as compliance-impacting
+- Validate cron syntax against AWS EventBridge documentation
+
+## Default Enable States
+
+All service plans are **enabled by default**:
+
+```hcl
+backup_plan_config_dynamodb = { enable = true, ... }
+backup_plan_config_ebsvol = { enable = true, ... }
+backup_plan_config_aurora = { enable = true, ... }
+backup_plan_config_parameter_store = { enable = true, ... }
+```
+
+Consumers must explicitly disable services they do not use.
+
+## Gotchas
+
+- DynamoDB and Aurora do NOT support continuous backup in copy rules
+- Aurora requires `restore_testing_overrides` (e.g. `dbsubnetgroupname`)
+- S3 requires versioning enabled on the source bucket
+- Parameter Store is not a native AWS Backup service — the blueprint uses a Lambda to export parameters to S3, which is then backed up via the normal S3 plan
diff --git a/.github/skills/python-lambdas/SKILL.md b/.github/skills/python-lambdas/SKILL.md
new file mode 100644
index 0000000..0b15a8d
--- /dev/null
+++ b/.github/skills/python-lambdas/SKILL.md
@@ -0,0 +1,107 @@
+# Python Lambdas
+
+Use this skill when the user works on Python Lambda code under `modules/aws-backup-source/resources/`.
+
+## Lambda Catalogue
+
+| Lambda | File | Purpose | Trigger | Has Tests |
+| -------- | ------ | --------- | --------- | ----------- |
+| **parameter-store-backup** | `parameter_store_backup.py` | Discovers SSM parameters by tag, encrypts with KMS, writes to S3 as `.encrypted` files | EventBridge cron (default: `0 6 * * ? *`) | Yes (4 cases) |
+| **copy-recovery-point** | `copy_recovery_point.py` + `lambda_function.py` | Copies recovery points from destination vault back to source vault cross-account via STS | EventBridge / manual | Yes (4 cases) |
+| **restore-to-s3** | `restore_to_s3.py` | Starts/monitors AWS Backup S3 recovery point restore jobs | Step Function / manual | Yes (6 cases, uses `botocore.stub.Stubber`) |
+| **post_build_version** | `post_build_version.py` | POSTs module version + account ID to API endpoint on every backup job | EventBridge (backup job complete) | No |
+
+### Stub Directories (Planned, Not Implemented)
+
+- `restore-to-aurora/` — empty (`__pycache__/` only)
+- `restore-to-dynamodb/` — empty
+- `restore-to-rds/` — empty
+
+## Code Patterns
+
+### Configuration Loading
+
+```python
+def load_configuration():
+ return {
+ 'kms_key_id': os.environ['KMS_KEY_ARN'],
+ 'bucket_name': os.environ['PARAMETER_STORE_BUCKET_NAME'],
+ ...
+ }
+```
+
+All config via `os.environ`. Handle `KeyError` for missing vars.
+
+### Error Handling
+
+```python
+try:
+ result = do_work()
+ return {"status": "SUCCESS", ...}
+except Exception as e:
+ logger.error("Failed", exc_info=True)
+ return {"status": "FAILED", "error": str(e)}
+```
+
+Log with `exc_info=True`, return structured dicts.
+
+### Cross-Account STS
+
+`copy_recovery_point.py` uses `sts:AssumeRole` when `ASSUME_ROLE_ARN` is set:
+
+```python
+sts_client = boto3.client('sts')
+credentials = sts_client.assume_role(
+ RoleArn=assume_role_arn,
+ RoleSessionName='cross-account-copy'
+)
+```
+
+## The `_build_copy_job_params` Gotcha
+
+In `copy_recovery_point.py`, the naming appears swapped but is **intentional**:
+
+- `DestinationBackupVaultArn` = `source_vault_arn` (copying **into** source)
+- `SourceBackupVaultName` = parsed from `destination_vault_arn` (copying **from** destination)
+
+This was a historical bug root cause (ENG-930) and must not be "fixed".
+
+## Testing
+
+```bash
+cd modules/aws-backup-source/resources/parameter-store-backup/
+python test_parameter_store_backup.py
+```
+
+- Framework: `unittest` (no pytest dependency)
+- Mocking: `unittest.mock.MagicMock` for AWS clients
+- Env vars: `@patch.dict(os.environ, {...})`
+- Pattern: test success path + at least one failure scenario
+- Run directly: `python test_.py`
+
+## Packaging
+
+- No separate build step — Terraform `data.archive_file` creates zip at plan/apply time
+- Each Lambda's `.tf` file references its source directory
+- Python runtime: 3.12
+
+## IAM Policy Pattern
+
+Keep actions minimal. Wildcard resources only when the service requires it:
+
+```terraform
+statement {
+ effect = "Allow"
+ actions = ["ssm:DescribeParameters", "ssm:GetParametersByPath", ...]
+ resources = ["arn:aws:ssm:*:*:*"]
+}
+```
+
+Document any `resources = ["*"]` with a justification comment.
+
+## Gotchas
+
+- `parameter_store_lambda_encryption_role` is a **fixed role name** referenced by the destination KMS key policy — do not rename
+- `post_build_version` has no tests — add tests if modifying
+- `copy-recovery-point` has two entry points: `lambda_function.py` re-exports from `copy_recovery_point.py`
+- Version is read from `modules/aws-backup-source/version` file at plan time via `file()`
diff --git a/.github/skills/vault-lock-safety/SKILL.md b/.github/skills/vault-lock-safety/SKILL.md
new file mode 100644
index 0000000..a12e514
--- /dev/null
+++ b/.github/skills/vault-lock-safety/SKILL.md
@@ -0,0 +1,70 @@
+# Vault Lock Safety
+
+Use this skill when the user mentions vault lock, compliance mode, immutability settings, or vault protection.
+
+## Vault Lock Modes
+
+Configured in `modules/aws-backup-destination/backup_vault_lock.tf`:
+
+### Governance Mode (Default)
+
+- Vault is deletable
+- Policy is editable
+- Suitable for testing and development
+- No irreversible consequences
+
+### Compliance Mode
+
+- **IRREVERSIBLE** after the cooling period expires
+- Cooling period: `changeable_for_days` (3–36,500 days, default 14)
+- During cooling period: lock can still be removed
+- After cooling period: **permanent** — cannot be removed, retention cannot be shortened, vault cannot be deleted
+
+## Variable Guards
+
+Both must be set to enable compliance mode:
+
+```hcl
+enable_vault_protection = true
+vault_lock_type = "compliance"
+```
+
+Optional:
+
+```hcl
+changeable_for_days = 14 # cooling period before lock becomes permanent
+```
+
+## Mandatory Confirmation
+
+**Never auto-enable compliance mode.**
+
+Before enabling, present this confirmation:
+
+> "User confirmation required: Enabling compliance mode (changeable_for_days=X) makes vault lock irreversible after the cooling period. Proceed? (Yes/No)"
+
+## Vault Policy Protection
+
+`enable_iam_protection` in the destination module denies destructive actions on the vault:
+
+```hcl
+enable_iam_protection = true
+```
+
+This adds deny statements for `DeleteBackupVault`, `PutBackupVaultAccessPolicy`, `DeleteRecoveryPoint`, `UpdateRecoveryPointLifecycle` except for specified principals.
+
+## Pre-Checklist Before Compliance Mode
+
+1. Confirm retention periods are final (cannot be shortened after lock)
+2. Confirm `changeable_for_days` value gives adequate cooling period
+3. Confirm this is a production environment (never lock dev/test vaults)
+4. Confirm the Information Asset Owner has approved retention values
+5. Confirm backup plans and selections are stable (adding new ones is fine, removing is not)
+6. Document the decision and date of enablement
+
+## Gotchas
+
+- There is no "undo" for compliance mode after the cooling period
+- Governance mode is safe for all environments
+- Vault lock applies to the **destination** vault only (source vault has no lock)
+- `min_retention_days` and `max_retention_days` on the lock constrain all recovery point lifecycles in the vault