Conversation
| 3. The Lambda function lists all repositories in the ECR registry. | ||
| 4. For each repository, the function lists all image tags. | ||
| 5. For each image tag (or a subsection of tags), the function pulls the image using the Docker CLI. | ||
| 6. The function then pushes the image to a designated S3 bucket in the source account, organizing images by repository and tag for easy retrieval. |
There was a problem hiding this comment.
Rather than hand-crank this, we'd be better off trying to leverage https://github.com/containers/skopeo?tab=readme-ov-file#syncing-registries or similar. If we can get an s3fs-fuse mount into a container lambda, it should Just Work. We might want to get a spike in to prove it out, but I'd much rather not have to build this bit ourselves.
| 2. The schedule event triggers an AWS Lambda function. | ||
| 3. The Lambda function lists all repositories in the ECR registry. | ||
| 4. For each repository, the function lists all image tags. | ||
| 5. For each image tag (or a subsection of tags), the function pulls the image using the Docker CLI. |
There was a problem hiding this comment.
How would an efficient "incremental" backup cohort be identified? Would the backup complete within 15mins? I don't know how large some people's images are, but there's a 10GB ephermeral storage limit in lambda so, for some, that might be exceeded, or at least there would need to be some housekeeping.
There was a problem hiding this comment.
If we use skopeo this shouldn't be a problem. It'll do the copy in layers (which is the unit of increment we have available)
There was a problem hiding this comment.
How do you do this in MESH?
|
|
||
| ## Step-by-Step Implementation | ||
|
|
||
| ### Stage 1: ECR to S3 Backup |
There was a problem hiding this comment.
Just for reference, on MESH, when we push and image to ECR, we also push the tarball to S3 - the build will fail if both aren't completed.
|
|
||
| ### Considerations | ||
|
|
||
| * ECR replication does not provide immutability; images can be deleted or overwritten. |
There was a problem hiding this comment.
There's https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-tag-mutability.html - but yes, a bad actor with admin access could delete images.
| Why an Immutable ECR Backup? | ||
| While Amazon ECR provides image replication, it lacks an immutable, long-term backup solution in a separate security boundary. In a disaster recovery (DR) scenario where a primary AWS account is compromised, standard replication is not sufficient. This solution addresses that by creating an "air-gapped" backup protected by an AWS Backup Vault Lock, which provides a Write-Once-Read-Many (WORM) model. | ||
|
|
||
| ## Solution Architecture |
There was a problem hiding this comment.
What about restoration? For reference, on MESH, the tarballs of the images are in the S3 remote immutable backup. For restoration, we fetch the tarball, docker load it, then push to ECR.
There was a problem hiding this comment.
There is an additional proposal for restoration feature to be built in to the blueprint as well.
The proposal doesn't include ECR at the moment but gives the framework for how the restoration will happen through the blueprint.
https://github.com/NHSDigital/terraform-aws-backup/pull/79/files
Description
The PR is a record of the proposal to support ECR through the blueprint.
Context
ECR backup solutions have been requested and the proposal details how this could be completed.
Type of changes
Checklist
Sensitive Information Declaration
To ensure the utmost confidentiality and protect your and others privacy, we kindly ask you to NOT including PII (Personal Identifiable Information) / PID (Personal Identifiable Data) or any other sensitive data in this PR (Pull Request) and the codebase changes. We will remove any PR that do contain any sensitive information. We really appreciate your cooperation in this matter.