diff --git a/docs/manual-restore-validation.md b/docs/manual-restore-validation.md new file mode 100644 index 0000000..e785bbc --- /dev/null +++ b/docs/manual-restore-validation.md @@ -0,0 +1,137 @@ +# Manual Restore Validation Design + +## 1. Purpose + +Provide a light‑weight, on-demand restore + validation workflow where the **customer supplies their own validation Lambda**. This complements automated restore testing plans by enabling ad-hoc integrity checks (e.g. regression assessment after schema change, pre-cutover rehearsal) without standing orchestration state machines. + +## 2. Overview + +Flow (single resource type per invocation): + +1. Operator (or CI job) invokes Orchestrator Lambda with optional `recoveryPointArn`. +2. Orchestrator chooses recovery point (latest if unspecified) from a backup vault. +3. Starts restore job using AWS Backup `StartRestoreJob`. +4. Polls status (`DescribeRestoreJob`) until terminal state. +5. Invokes customer validator Lambda with contextual payload. +6. Normalises validator response -> calls `PutRestoreValidationResult`. +7. Returns composite result to caller (for CLI / API inspection). + +No Step Functions required for typical short restore + validation cycles; for long running (>15 min) scenarios Step Functions could replace polling. + +## 3. Roles & Responsibilities + +| Component | Responsibility | +|-----------|----------------| +| Orchestrator Lambda | Restore initiation, polling, validator invocation, publishing result | +| Customer Validator Lambda | Domain/resource-specific integrity checks (S3 object presence, record counts, hashes, etc.) | +| AWS Backup | Recovery point catalog & restore execution | +| IAM | Enforces least privilege for restore & validation actions | + +## 4. Invocation Payload (Optional Fields) + +```json +{ + "recoveryPointArn": "arn:aws:backup:...:recovery-point:...", // optional override + "expectedKeys": ["path/example1.txt", "path/example2.txt"], // validator-specific + "expectedMinObjects": 10 // optional fallback +} +``` + +## 5. Validator Contract + +Input delivered to customer Lambda (superset of invocation + restore context): + +```json +{ + "restoreJobId": "...", + "recoveryPointArn": "...", + "resourceType": "S3", + "createdResourceArn": "arn:aws:s3:::restored-bucket", + "targetBucket": "restored-bucket", + "s3": { "bucket": "restored-bucket" }, + "expectedKeys": ["..."], + "expectedMinObjects": 10 +} +``` + +Return: + +```json +{ "status": "SUCCESSFUL|FAILED|SKIPPED", "message": "summary", "details": { } } +``` +Status mapping is case-insensitive; unknown maps to FAILED. + +## 6. Security Considerations + +- Orchestrator policy limited to listing recovery points, starting & describing restore jobs, publishing validation, invoking a single validator ARN. +- Validator policy scoped to specific target bucket ARNs (S3 example). +- Sensitive data avoidance: orchestrator does not log object contents, only metadata. +- Optionally use a dedicated IAM restore role if restore requires cross-service access. + +## 7. Error Handling + +| Scenario | Behaviour | +|----------|-----------| +| No recovery points | Orchestrator throws error (non-validation) | +| Restore timeout | Error after 55m (FAILED not published) | +| Validator throws | Orchestrator records FAILED with parse/message fallback | +| Validator returns malformed JSON | Treated as FAILED with parse error message | + +## 8. Extensibility + +- Add multi-resource batch mode via Step Functions if needed. +- Support additional resource types by adjusting Metadata mapping (e.g. RDS cluster restore specifics). +- Emit custom metrics (future) for restore duration & validator latency. + +## 9. Example S3 Validator Patterns + +| Pattern | Description | +|---------|-------------| +| Key existence | Ensure enumerated critical objects are present (manifest-sourced) | +| Non-empty bucket | Basic continuity signal after restore | +| Minimum count | Validate approximate dataset size threshold | +| Sample integrity | (Future) HEAD + ETag comparison against manifest | + +## 10. Terraform Surfaces + +Module `aws-backup-manual-validation` variables: + +- `backup_vault_name` (string, required) +- `validation_lambda_arn` (string, required) +- `resource_type` (string, e.g. S3) +- `target_bucket_name` (string, S3 convenience) +- `name_prefix` (string) + +Outputs: + +- `orchestrator_lambda_arn` + +## 11. Invocation Examples + +AWS CLI (invoke latest recovery point): + +```bash +aws lambda invoke \ + --function-name myproj-dev-manual-restore-orchestrator \ + --payload '{}' out.json && cat out.json | jq +``` + +Explicit recovery point + expected keys: + +```bash +aws lambda invoke \ + --function-name myproj-dev-manual-restore-orchestrator \ + --payload '{"recoveryPointArn":"arn:aws:backup:..","expectedKeys":["manifest.json","data/file1"]}' out.json +``` + +## 12. Limitations + +- Long-running restores may exceed Lambda timeout (convert to Step Functions for scale/time). +- Only single resource restore per invocation. +- No built-in notification channel (user can layer SNS or EventBridge rule on Lambda logs/exits). + +## 13. Future Enhancements + +- Step Functions wrapper for large parallel restores. +- Parameter / Secrets retrieval for RDS validation credentials. +- Config-driven validator selection registry. diff --git a/examples/customer-s3-validator/index.ts b/examples/customer-s3-validator/index.ts new file mode 100644 index 0000000..9192687 --- /dev/null +++ b/examples/customer-s3-validator/index.ts @@ -0,0 +1,58 @@ +import { S3Client, HeadObjectCommand, ListObjectsV2Command } from "@aws-sdk/client-s3"; + +const s3 = new S3Client({}); + +/* Example validator strategy: + 1. If event.expectedKeys provided -> verify each exists. + 2. Else if event.s3.bucket provided -> ensure bucket contains at least one object (or expectedMinObjects). + Return status + message summarising findings. +*/ + +interface EventShape { + restoreJobId: string; + recoveryPointArn: string; + resourceType: string; + createdResourceArn?: string; + targetBucket?: string; + s3?: { bucket?: string }; + expectedKeys?: string[]; + expectedMinObjects?: number; +} + +export const handler = async (event: EventShape) => { + const bucket = event.targetBucket || event.s3?.bucket; + if (!bucket) { + return { status: "SKIPPED", message: "No bucket specified" }; + } + + if (event.expectedKeys && event.expectedKeys.length > 0) { + const missing: string[] = []; + for (const key of event.expectedKeys) { + try { + await s3.send(new HeadObjectCommand({ Bucket: bucket, Key: key })); + } catch (e) { + missing.push(key); + } + } + if (missing.length > 0) { + return { status: "FAILED", message: `Missing ${missing.length} objects`, missing }; + } + return { status: "SUCCESSFUL", message: `All ${event.expectedKeys.length} expected objects present` }; + } + + // Fallback: simple non-empty check or min object threshold + const min = event.expectedMinObjects ?? 1; + let found = 0; + let ContinuationToken: string | undefined = undefined; + while (found < min) { + const resp = await s3.send(new ListObjectsV2Command({ Bucket: bucket, MaxKeys: 1000, ContinuationToken })); + const count = resp.Contents?.length || 0; + found += count; + if (!resp.IsTruncated) break; + ContinuationToken = resp.NextContinuationToken; + } + if (found < min) { + return { status: "FAILED", message: `Only ${found} objects found (< ${min})` }; + } + return { status: "SUCCESSFUL", message: `Found ${found} objects (>= ${min})` }; +}; diff --git a/examples/customer-s3-validator/package.json b/examples/customer-s3-validator/package.json new file mode 100644 index 0000000..0761b10 --- /dev/null +++ b/examples/customer-s3-validator/package.json @@ -0,0 +1,16 @@ +{ + "name": "customer-s3-validator-example", + "version": "0.1.0", + "private": true, + "type": "module", + "scripts": { + "build": "tsc -p tsconfig.json" + }, + "dependencies": { + "@aws-sdk/client-s3": "^3.637.0" + }, + "devDependencies": { + "typescript": "^5.4.0", + "@types/node": "^20.11.0" + } +} diff --git a/examples/customer-s3-validator/tsconfig.json b/examples/customer-s3-validator/tsconfig.json new file mode 100644 index 0000000..8cb4dd2 --- /dev/null +++ b/examples/customer-s3-validator/tsconfig.json @@ -0,0 +1,14 @@ +{ + "compilerOptions": { + "target": "ES2020", + "module": "ES2020", + "moduleResolution": "Node", + "outDir": "dist", + "rootDir": ".", + "esModuleInterop": true, + "strict": true, + "skipLibCheck": true + }, + "include": ["index.ts"], + "exclude": ["node_modules"] +} diff --git a/examples/manual-validation/main.tf b/examples/manual-validation/main.tf new file mode 100644 index 0000000..5210885 --- /dev/null +++ b/examples/manual-validation/main.tf @@ -0,0 +1,83 @@ +terraform { + required_version = ">= 1.5.0" + required_providers { + aws = { + source = "hashicorp/aws" + version = ">= 5.0" + } + } +} + +provider "aws" { + region = var.region +} + +variable "region" { type = string } +variable "name_prefix" { type = string } +variable "backup_vault_name" { type = string } +variable "restore_bucket" { type = string } + +# Example customer validator lambda (upload dist bundle manually or integrate build pipeline). +resource "aws_lambda_function" "customer_validator" { + function_name = "${var.name_prefix}-customer-s3-validator" + role = aws_iam_role.customer_validator.arn + handler = "index.handler" + runtime = "nodejs20.x" + filename = "./lambda_customer_validator.zip" # user supplied artifact + source_code_hash = filebase64sha256("./lambda_customer_validator.zip") + timeout = 60 + environment { + variables = {} + } +} + +resource "aws_iam_role" "customer_validator" { + name = "${var.name_prefix}-customer-s3-validator-role" + assume_role_policy = data.aws_iam_policy_document.lambda_assume.json +} + +data "aws_iam_policy_document" "lambda_assume" { + statement { + actions = ["sts:AssumeRole"] + principals { type = "Service" identifiers = ["lambda.amazonaws.com"] } + } +} + +resource "aws_iam_role_policy_attachment" "logs_attach_customer" { + role = aws_iam_role.customer_validator.name + policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole" +} + +resource "aws_iam_policy" "customer_s3_policy" { + name = "${var.name_prefix}-customer-s3-validator-policy" + policy = jsonencode({ + Version = "2012-10-17" + Statement = [ + { + Effect = "Allow" + Action = ["s3:ListBucket", "s3:GetObject", "s3:HeadObject"] + Resource = [ + "arn:aws:s3:::${var.restore_bucket}", + "arn:aws:s3:::${var.restore_bucket}/*" + ] + } + ] + }) +} + +resource "aws_iam_role_policy_attachment" "customer_validator_attach" { + role = aws_iam_role.customer_validator.name + policy_arn = aws_iam_policy.customer_s3_policy.arn +} + +module "manual_validation" { + source = "../../modules/aws-backup-manual-validation" + enable = true + name_prefix = var.name_prefix + backup_vault_name = var.backup_vault_name + resource_type = "S3" + validation_lambda_arn = aws_lambda_function.customer_validator.arn + target_bucket_name = var.restore_bucket +} + +output "orchestrator_lambda" { value = module.manual_validation.orchestrator_lambda_arn } diff --git a/modules/aws-backup-manual-validation/README.md b/modules/aws-backup-manual-validation/README.md new file mode 100644 index 0000000..4ec158c --- /dev/null +++ b/modules/aws-backup-manual-validation/README.md @@ -0,0 +1,84 @@ +# AWS Backup Manual Restore Validation Module + +Provides an on-demand Lambda **orchestrator** that: + +1. Selects a recovery point (latest by default) from a specified backup vault. +2. Starts a restore job for the chosen recovery point (supports S3 in example). +3. Waits for restore job completion (polling AWS Backup). +4. Invokes a **customer-provided validation Lambda** (you own resource-specific logic). +5. Publishes validation status back to AWS Backup using `PutRestoreValidationResult`. + +This pattern differs from automated restore testing plans: it is **manually triggered** (e.g. via `aws lambda invoke` or an API Gateway front-end) and delegates validation logic entirely to a customer-maintained Lambda. + +## Key Design Principles + +- **Separation of concerns**: Orchestrator handles restore lifecycle & result publishing; customer Lambda handles semantic integrity checks. +- **Pluggable**: Any runtime or language for validator (only contract is JSON in/out). +- **Minimal surface**: No Step Functions required for single-resource manual validation. + +## Orchestrator Environment Variables + +| Variable | Purpose | +|----------|---------| +| `BACKUP_VAULT_NAME` | Source vault to enumerate recovery points | +| `RESOURCE_TYPE` | Backup resource type (e.g. `S3`) | +| `VALIDATOR_LAMBDA` | ARN of customer validator Lambda | +| `TARGET_BUCKET` | (S3 only) Destination bucket name to validate | +| `RESTORE_ROLE_ARN` | (Optional) IAM role used for restore job | + +## Customer Validator Contract + +**Invocation Payload** (example): + +```json +{ + "restoreJobId": "1234abcd", + "recoveryPointArn": "arn:aws:backup:...:recovery-point:...", + "resourceType": "S3", + "createdResourceArn": "arn:aws:s3:::restored-bucket", + "targetBucket": "restored-bucket", + "s3": { "bucket": "restored-bucket" } +} +``` + +**Return Object**: + +```json +{ "status": "SUCCESSFUL|FAILED|SKIPPED", "message": "Human readable summary" } +``` +Statuses are normalised by the orchestrator before calling `PutRestoreValidationResult`. + +## Terraform Inputs + +See `variables.tf` for full list. Essential: + +```hcl +module "manual_validation" { + source = "../modules/aws-backup-manual-validation" + enable = true + name_prefix = var.name_prefix + backup_vault_name = var.backup_vault_name + resource_type = "S3" + validation_lambda_arn = aws_lambda_function.customer_validator.arn + target_bucket_name = var.target_restore_bucket +} +``` + +## Example Validator (S3 Presence / Count) + +See `../../examples/customer-s3-validator` for a full TypeScript implementation scanning a set of expected keys or listing a prefix to ensure non-empty restore. + +## Operational Notes + +- Timeouts: Orchestrator Lambda default timeout is 15 minutes; long restores will exceed this—use small test datasets or adapt to Step Functions if needed. +- Costs: Avoid listing millions of S3 keys in the validator; prefer sampling. +- IAM Hardening: Current policy uses broad `backup:*` subset and `s3:Get*`; tighten to specific ARNs in production. + +## Future Enhancements + +- Option to specify explicit recovery point instead of auto-pick (supported already via event.recoveryPointArn field). +- Emit custom CloudWatch metrics for validation duration & success rate. +- Optional SNS notification on failure. + +--- +MIT style licensing per repository policy. diff --git a/modules/aws-backup-manual-validation/dist/orchestrator.js b/modules/aws-backup-manual-validation/dist/orchestrator.js new file mode 100644 index 0000000..9c19246 --- /dev/null +++ b/modules/aws-backup-manual-validation/dist/orchestrator.js @@ -0,0 +1,106 @@ +import { BackupClient, ListRecoveryPointsByBackupVaultCommand, StartRestoreJobCommand, DescribeRestoreJobCommand, PutRestoreValidationResultCommand } from "@aws-sdk/client-backup"; +import { LambdaClient, InvokeCommand } from "@aws-sdk/client-lambda"; +import { S3Client } from "@aws-sdk/client-s3"; +const backup = new BackupClient({}); +const lambda = new LambdaClient({}); +const s3 = new S3Client({}); +const BACKUP_VAULT_NAME = process.env.BACKUP_VAULT_NAME; +const RESOURCE_TYPE = process.env.RESOURCE_TYPE; // e.g. S3 +const VALIDATOR_LAMBDA = process.env.VALIDATOR_LAMBDA; +const TARGET_BUCKET = process.env.TARGET_BUCKET; // optional S3 bucket +export const handler = async (event = {}) => { + console.log(JSON.stringify({ msg: "Manual restore orchestration start", event })); + const recoveryPointArn = event.recoveryPointArn || await pickLatestRecoveryPoint(); + console.log({ recoveryPointArn }); + const restoreJobId = await startRestore(recoveryPointArn); + console.log({ restoreJobId }); + const restoreDesc = await waitForCompletion(restoreJobId); + console.log({ restoreDesc }); + const validatorPayload = { + restoreJobId, + recoveryPointArn, + resourceType: RESOURCE_TYPE, + createdResourceArn: restoreDesc.CreatedResourceArn, + targetBucket: TARGET_BUCKET, + s3: { bucket: TARGET_BUCKET } + }; + const validationResult = await invokeValidator(validatorPayload); + console.log({ validationResult }); + await publishValidation(restoreJobId, validationResult); + return { + restoreJobId, + recoveryPointArn, + validation: validationResult + }; +}; +async function pickLatestRecoveryPoint() { + const cmd = new ListRecoveryPointsByBackupVaultCommand({ BackupVaultName: BACKUP_VAULT_NAME, MaxResults: 20 }); + const resp = await backup.send(cmd); + if (!resp.RecoveryPoints || resp.RecoveryPoints.length === 0) { + throw new Error("No recovery points found in vault"); + } + const sorted = [...resp.RecoveryPoints].sort((a, b) => (b.CreationDate?.getTime() || 0) - (a.CreationDate?.getTime() || 0)); + return sorted[0].RecoveryPointArn; +} +async function startRestore(recoveryPointArn) { + const cmd = new StartRestoreJobCommand({ + RecoveryPointArn: recoveryPointArn, + IamRoleArn: process.env.RESTORE_ROLE_ARN, + ResourceType: RESOURCE_TYPE, + Metadata: TARGET_BUCKET ? { destinationBucketName: TARGET_BUCKET } : {} + }); + const resp = await backup.send(cmd); + if (!resp.RestoreJobId) + throw new Error("StartRestoreJob returned no RestoreJobId"); + return resp.RestoreJobId; +} +async function waitForCompletion(restoreJobId) { + const timeoutMs = 1000 * 60 * 55; + const start = Date.now(); + while (Date.now() - start < timeoutMs) { + const desc = await backup.send(new DescribeRestoreJobCommand({ RestoreJobId: restoreJobId })); + if (desc.Status === "COMPLETED" || desc.Status === "ABORTED" || desc.Status === "FAILED") { + return desc; + } + await new Promise(r => setTimeout(r, 15000)); + } + throw new Error("Restore job did not finish within timeout"); +} +async function invokeValidator(payload) { + const cmd = new InvokeCommand({ + FunctionName: VALIDATOR_LAMBDA, + InvocationType: "RequestResponse", + Payload: Buffer.from(JSON.stringify(payload)) + }); + const resp = await lambda.send(cmd); + if (!resp.Payload) + throw new Error("Validator returned no payload"); + const txt = Buffer.from(resp.Payload).toString("utf-8"); + try { + return JSON.parse(txt); + } catch (e) { + throw new Error("Validator payload JSON parse error: " + txt); + } +} +async function publishValidation(restoreJobId, result) { + const status = mapStatus(result.status); + const message = (result.message || "").slice(0, 1000); + const cmd = new PutRestoreValidationResultCommand({ + RestoreJobId: restoreJobId, + ValidationStatus: status, + ValidationStatusMessage: message + }); + await backup.send(cmd); +} +function mapStatus(s) { + if (!s) + return "FAILED"; + const upper = s.toUpperCase(); + if (["SUCCESS", "SUCCESSFUL", "OK"].includes(upper)) + return "SUCCESSFUL"; + if (["FAILED", "FAIL", "ERROR"].includes(upper)) + return "FAILED"; + if (["SKIPPED", "IGNORE", "IGNORED"].includes(upper)) + return "SKIPPED"; + return "FAILED"; +} diff --git a/modules/aws-backup-manual-validation/iam.tf b/modules/aws-backup-manual-validation/iam.tf new file mode 100644 index 0000000..1601dc3 --- /dev/null +++ b/modules/aws-backup-manual-validation/iam.tf @@ -0,0 +1,76 @@ +locals { + manual_validation_name = "${var.name_prefix}-manual-restore-validation" +} + +resource "aws_iam_role" "orchestrator" { + count = var.enable ? 1 : 0 + name = "${local.manual_validation_name}-orchestrator" + assume_role_policy = data.aws_iam_policy_document.orchestrator_assume.json +} + +data "aws_iam_policy_document" "orchestrator_assume" { + statement { + actions = ["sts:AssumeRole"] + principals { + type = "Service" + identifiers = ["lambda.amazonaws.com"] + } + } +} + +# NOTE: Permissions are intentionally broad placeholders; should be tightened. +# Includes: listing recovery points, starting restore job, describing restore job, +# invoking customer validation Lambda, writing logs, optional S3 read. + +data "aws_iam_policy_document" "orchestrator" { + statement { + sid = "Logs" + actions = [ + "logs:CreateLogGroup", + "logs:CreateLogStream", + "logs:PutLogEvents" + ] + resources = ["*"] + } + + statement { + sid = "BackupCore" + actions = [ + "backup:ListRecoveryPointsByBackupVault", + "backup:StartRestoreJob", + "backup:DescribeRestoreJob", + "backup:PutRestoreValidationResult" + ] + resources = ["*"] + } + + statement { + sid = "InvokeValidator" + actions = [ + "lambda:InvokeFunction" + ] + resources = [var.validation_lambda_arn] + } + + statement { + sid = "S3ReadOptional" + actions = [ + "s3:ListBucket", + "s3:GetObject", + "s3:HeadObject" + ] + resources = ["*"] + } +} + +resource "aws_iam_policy" "orchestrator" { + count = var.enable ? 1 : 0 + name = "${local.manual_validation_name}-policy" + policy = data.aws_iam_policy_document.orchestrator.json +} + +resource "aws_iam_role_policy_attachment" "orchestrator" { + count = var.enable ? 1 : 0 + role = aws_iam_role.orchestrator[0].name + policy_arn = aws_iam_policy.orchestrator[0].arn +} diff --git a/modules/aws-backup-manual-validation/lambda.tf b/modules/aws-backup-manual-validation/lambda.tf new file mode 100644 index 0000000..58d6de6 --- /dev/null +++ b/modules/aws-backup-manual-validation/lambda.tf @@ -0,0 +1,45 @@ +locals { + orchestrator_src_dir = "${path.module}/src" +} + +resource "aws_cloudwatch_log_group" "orchestrator" { + count = var.enable ? 1 : 0 + name = "/aws/lambda/${aws_lambda_function.orchestrator[0].function_name}" + retention_in_days = 30 +} + +# We keep a pre-built JS file for simplicity; user can rebuild if modifying. +# (If a build step is desired, integrate external build pipeline.) + +data "archive_file" "orchestrator" { + type = "zip" + source_file = "${path.module}/dist/orchestrator.js" + output_path = "${path.module}/dist/orchestrator.zip" +} + +resource "aws_lambda_function" "orchestrator" { + count = var.enable ? 1 : 0 + function_name = "${var.name_prefix}-manual-restore-orchestrator" + role = aws_iam_role.orchestrator[0].arn + handler = "orchestrator.handler" + runtime = "nodejs20.x" + filename = data.archive_file.orchestrator.output_path + source_code_hash = data.archive_file.orchestrator.output_base64sha256 + timeout = 900 + memory_size = 256 + + environment { + variables = { + BACKUP_VAULT_NAME = var.backup_vault_name + RESOURCE_TYPE = var.resource_type + VALIDATOR_LAMBDA = var.validation_lambda_arn + TARGET_BUCKET = var.target_bucket_name + } + } + tags = var.tags +} + +output "manual_restore_orchestrator_lambda_arn" { + value = try(aws_lambda_function.orchestrator[0].arn, null) + description = "ARN of the manual restore orchestrator lambda" +} diff --git a/modules/aws-backup-manual-validation/outputs.tf b/modules/aws-backup-manual-validation/outputs.tf new file mode 100644 index 0000000..6cc5e34 --- /dev/null +++ b/modules/aws-backup-manual-validation/outputs.tf @@ -0,0 +1,4 @@ +output "orchestrator_lambda_arn" { + value = try(aws_lambda_function.orchestrator[0].arn, null) + description = "Manual restore validation orchestrator Lambda ARN" +} diff --git a/modules/aws-backup-manual-validation/package.json b/modules/aws-backup-manual-validation/package.json new file mode 100644 index 0000000..fc5555b --- /dev/null +++ b/modules/aws-backup-manual-validation/package.json @@ -0,0 +1,20 @@ +{ + "name": "aws-backup-manual-validation-orchestrator", + "version": "0.1.0", + "private": true, + "type": "module", + "scripts": { + "build": "tsc --project tsconfig.json", + "clean": "rimraf dist" + }, + "dependencies": { + "@aws-sdk/client-backup": "^3.637.0", + "@aws-sdk/client-lambda": "^3.637.0", + "@aws-sdk/client-s3": "^3.637.0" + }, + "devDependencies": { + "typescript": "^5.4.0", + "@types/node": "^20.11.0", + "rimraf": "^5.0.5" + } +} diff --git a/modules/aws-backup-manual-validation/src/orchestrator.ts b/modules/aws-backup-manual-validation/src/orchestrator.ts new file mode 100644 index 0000000..eda153e --- /dev/null +++ b/modules/aws-backup-manual-validation/src/orchestrator.ts @@ -0,0 +1,125 @@ +/* Orchestrator Lambda (TypeScript) + Triggers a manual restore job for a chosen recovery point and invokes a customer-provided validation Lambda. + The customer Lambda should return JSON: { status: "SUCCESSFUL|FAILED|SKIPPED", message: string } +*/ +import { BackupClient, ListRecoveryPointsByBackupVaultCommand, StartRestoreJobCommand, DescribeRestoreJobCommand, PutRestoreValidationResultCommand } from "@aws-sdk/client-backup"; +import { LambdaClient, InvokeCommand } from "@aws-sdk/client-lambda"; +import { S3Client, HeadObjectCommand } from "@aws-sdk/client-s3"; + +const backup = new BackupClient({}); +const lambda = new LambdaClient({}); +const s3 = new S3Client({}); + +const BACKUP_VAULT_NAME = process.env.BACKUP_VAULT_NAME!; +const RESOURCE_TYPE = process.env.RESOURCE_TYPE!; // e.g. S3 +const VALIDATOR_LAMBDA = process.env.VALIDATOR_LAMBDA!; +const TARGET_BUCKET = process.env.TARGET_BUCKET; // optional S3 bucket + +interface ValidatorResult { status: string; message?: string; [k: string]: any } + +export const handler = async (event: any = {}): Promise => { + console.log(JSON.stringify({ msg: "Manual restore orchestration start", event })); + + const recoveryPointArn = event.recoveryPointArn || await pickLatestRecoveryPoint(); + console.log({ recoveryPointArn }); + + const restoreJobId = await startRestore(recoveryPointArn); + console.log({ restoreJobId }); + + const restoreDesc = await waitForCompletion(restoreJobId); + console.log({ restoreDesc }); + + const validatorPayload = { + restoreJobId, + recoveryPointArn, + resourceType: RESOURCE_TYPE, + createdResourceArn: restoreDesc.CreatedResourceArn, + targetBucket: TARGET_BUCKET, + // Additional S3 example context the customer validator might use: + s3: { bucket: TARGET_BUCKET } + }; + + const validationResult = await invokeValidator(validatorPayload); + console.log({ validationResult }); + + await publishValidation(restoreJobId, validationResult); + + return { + restoreJobId, + recoveryPointArn, + validation: validationResult + }; +}; + +async function pickLatestRecoveryPoint(): Promise { + const cmd = new ListRecoveryPointsByBackupVaultCommand({ BackupVaultName: BACKUP_VAULT_NAME, MaxResults: 20 }); + const resp = await backup.send(cmd); + if (!resp.RecoveryPoints || resp.RecoveryPoints.length === 0) { + throw new Error("No recovery points found in vault"); + } + // Sort by CreationDate descending + const sorted = [...resp.RecoveryPoints].sort((a, b) => (b.CreationDate?.getTime() || 0) - (a.CreationDate?.getTime() || 0)); + return sorted[0].RecoveryPointArn!; +} + +async function startRestore(recoveryPointArn: string): Promise { + // For S3 we can do a metadata-only restore or specify a placeholder + const cmd = new StartRestoreJobCommand({ + RecoveryPointArn: recoveryPointArn, + IamRoleArn: process.env.RESTORE_ROLE_ARN, + ResourceType: RESOURCE_TYPE, + Metadata: TARGET_BUCKET ? { destinationBucketName: TARGET_BUCKET } : {} + }); + const resp = await backup.send(cmd); + if (!resp.RestoreJobId) throw new Error("StartRestoreJob returned no RestoreJobId"); + return resp.RestoreJobId; +} + +async function waitForCompletion(restoreJobId: string) { + const timeoutMs = 1000 * 60 * 55; // 55 minutes safety + const start = Date.now(); + while (Date.now() - start < timeoutMs) { + const desc = await backup.send(new DescribeRestoreJobCommand({ RestoreJobId: restoreJobId })); + if (desc.Status === "COMPLETED" || desc.Status === "ABORTED" || desc.Status === "FAILED") { + return desc; + } + await new Promise(r => setTimeout(r, 15000)); + } + throw new Error("Restore job did not finish within timeout"); +} + +async function invokeValidator(payload: any): Promise { + const cmd = new InvokeCommand({ + FunctionName: VALIDATOR_LAMBDA, + InvocationType: "RequestResponse", + Payload: Buffer.from(JSON.stringify(payload)) + }); + const resp = await lambda.send(cmd); + if (!resp.Payload) throw new Error("Validator returned no payload"); + const txt = Buffer.from(resp.Payload).toString("utf-8"); + try { + return JSON.parse(txt); + } catch (e) { + throw new Error("Validator payload JSON parse error: " + txt); + } +} + +async function publishValidation(restoreJobId: string, result: ValidatorResult) { + const status = mapStatus(result.status); + const message = (result.message || "").slice(0, 1000); + const cmd = new PutRestoreValidationResultCommand({ + RestoreJobId: restoreJobId, + ValidationStatus: status, + ValidationStatusMessage: message + }); + await backup.send(cmd); +} + +function mapStatus(s?: string): string { + if (!s) return "FAILED"; + const upper = s.toUpperCase(); + if (["SUCCESS", "SUCCESSFUL", "OK"].includes(upper)) return "SUCCESSFUL"; + if (["FAILED", "FAIL", "ERROR"].includes(upper)) return "FAILED"; + if (["SKIPPED", "IGNORE", "IGNORED"].includes(upper)) return "SKIPPED"; + return "FAILED"; +} diff --git a/modules/aws-backup-manual-validation/tsconfig.json b/modules/aws-backup-manual-validation/tsconfig.json new file mode 100644 index 0000000..01dcc7f --- /dev/null +++ b/modules/aws-backup-manual-validation/tsconfig.json @@ -0,0 +1,16 @@ +{ + "compilerOptions": { + "target": "ES2020", + "module": "ES2020", + "moduleResolution": "Node", + "outDir": "dist", + "rootDir": "src", + "esModuleInterop": true, + "forceConsistentCasingInFileNames": true, + "strict": true, + "skipLibCheck": true, + "resolveJsonModule": true + }, + "include": ["src/**/*.ts"], + "exclude": ["node_modules"] +} diff --git a/modules/aws-backup-manual-validation/variables.tf b/modules/aws-backup-manual-validation/variables.tf new file mode 100644 index 0000000..4a18a79 --- /dev/null +++ b/modules/aws-backup-manual-validation/variables.tf @@ -0,0 +1,43 @@ +variable "enable" { + type = bool + default = true + description = "Whether to create manual validation orchestration resources." +} + +variable "name_prefix" { + type = string + description = "Prefix used for naming resources (e.g. project-env)." +} + +variable "backup_vault_name" { + type = string + description = "Name of the backup vault containing recovery points to restore for manual tests." +} + +variable "restore_role_arn" { + type = string + description = "IAM role ARN used by the restore job if a specific role is required (optional)." + default = null +} + +variable "validation_lambda_arn" { + type = string + description = "Customer-provided Lambda ARN that performs validation after manual restore completes." +} + +variable "resource_type" { + type = string + description = "AWS Backup resource type for manual restore (e.g. S3, DynamoDB, RDS)." +} + +variable "target_bucket_name" { + type = string + description = "For S3 restores: name of the destination S3 bucket that the restore will produce or populate. Used only in the example orchestrator logic." + default = null +} + +variable "tags" { + type = map(string) + default = {} + description = "Tags to apply to created resources." +} diff --git a/modules/aws-backup-manual-validation/versions.tf b/modules/aws-backup-manual-validation/versions.tf new file mode 100644 index 0000000..7f163ea --- /dev/null +++ b/modules/aws-backup-manual-validation/versions.tf @@ -0,0 +1,9 @@ +terraform { + required_version = ">= 1.5.0" + required_providers { + aws = { + source = "hashicorp/aws" + version = ">= 5.0" + } + } +}