Skip to content

Update: [AEA-6053] - document preprocessing#221

Merged
anthony-nhs merged 45 commits intomainfrom
AEA-6053-document-preprocessing
Jan 26, 2026
Merged

Update: [AEA-6053] - document preprocessing#221
anthony-nhs merged 45 commits intomainfrom
AEA-6053-document-preprocessing

Conversation

@bencegadanyi1-nhs
Copy link
Copy Markdown
Contributor

@bencegadanyi1-nhs bencegadanyi1-nhs commented Dec 10, 2025

Summary

  • 🤖 Operational or Infrastructure Change
  • ✨ New Feature

Details

  • add preprocessing lambda to convert documents to markdown before KB ingestion
  • two-stage S3 pipeline: raw/ -> preprocessing -> processed/ -> sync
  • Sonar fix suggestion: replace /tmp security issue with secure temp directory handling
  • cfn Guard suppressions for Lambda permission
  • cdk-nag suppressions for preprocessing policie
  • add excel-specific filtering for NHS scal documents
  • excluding cli.py and magika_shim.py from SonarCloud coverage

@github-actions
Copy link
Copy Markdown
Contributor

This PR is linked to a ticket in an NHS Digital JIRA Project. Here's a handy link to the ticket:

AEA-6053

@bencegadanyi1-nhs bencegadanyi1-nhs marked this pull request as ready for review January 5, 2026 10:10
…mbda

Two fixes for preprocessing Lambda:

1. Grant KMS permissions: Added kmsKey.grantEncryptDecrypt() to allow
   Lambda to decrypt files from raw/ and encrypt files to processed/

2. Auto-create folders: Use BucketDeployment to create raw/ and
   processed/ folders on stack deployment. Much simpler than custom
   resource approach.

Files:
- packages/cdk/assets/s3-folders/raw/.gitkeep
- packages/cdk/assets/s3-folders/processed/.gitkeep
- packages/cdk/stacks/EpsAssistMeStack.ts
Suppress cdk-nag warnings for:
- BucketDeployment IAM managed policy usage
- BucketDeployment wildcard S3/KMS permissions
- BucketDeployment Lambda runtime
- Preprocessing Lambda KMS wildcard permissions (GenerateDataKey*, ReEncrypt*)
Critical bug: S3Bucket constructor was using this.kmsKey before it was assigned.
Changed line 34 from 'encryptionKey: this.kmsKey' to 'encryptionKey: kmsKey'
to use the local variable that was already created.

This prevented the bucket from being properly encrypted with our KMS key,
which is why the preprocessing Lambda couldn't decrypt objects.
Comment thread .github/workflows/pull_request.yml
Comment thread packages/cdk/constructs/S3LambdaNotification.ts
Comment thread packages/preprocessingFunction/app/handler.py
Comment thread packages/preprocessingFunction/app/handler.py
Comment thread packages/cdk/constructs/S3LambdaNotification.ts
@bencegadanyi1-nhs bencegadanyi1-nhs enabled auto-merge (squash) January 22, 2026 09:05
@sonarqubecloud
Copy link
Copy Markdown

@bencegadanyi1-nhs bencegadanyi1-nhs enabled auto-merge (squash) January 26, 2026 16:45
@anthony-nhs anthony-nhs disabled auto-merge January 26, 2026 16:51
@anthony-nhs anthony-nhs merged commit 756594a into main Jan 26, 2026
14 checks passed
@anthony-nhs anthony-nhs deleted the AEA-6053-document-preprocessing branch January 26, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants