Skip to content

fix(prisma): add retry for Aurora Serverless v2 connection errors#121

Open
konokenj wants to merge 4 commits intomainfrom
feature/prisma-retry
Open

fix(prisma): add retry for Aurora Serverless v2 connection errors#121
konokenj wants to merge 4 commits intomainfrom
feature/prisma-retry

Conversation

@konokenj
Copy link
Contributor

Issue

close #104
close #105

Problem

The starter kit has three issues with Prisma + Aurora Serverless v2 (auto-pause enabled with minCapacity: 0):

  1. Credential leak: console.log(process.env.DATABASE_URL) in prisma.ts outputs the full connection string including password to CloudWatch Logs.

  2. No runtime retry: Aurora drops idle connections after idle_session_timeout (60s) and takes ~15s to resume from auto-pause (docs). Without retry, queries fail with transient errors (P1017, ECONNRESET) and do not recover.

  3. No migration retry: migration-runner.ts runs prisma db push without retry. During cdk deploy, Aurora may still be resuming, causing P1001 ("Can't reach database server") and failing the entire deployment.

Solution

  • Remove console.log(DATABASE_URL) to fix the credential leak.
  • Add a Prisma client extension (Prisma.defineExtension with $allModels.$allOperations) that retries transient connection errors with exponential backoff. Retryable errors: P2024, P1001, P1017, idle-session timeout, ECONNRESET. Non-retryable errors (auth failures, schema errors) are thrown immediately.
  • Add retry to migration-runner.ts for prisma db push with exponential backoff (base 3s, max 5 attempts, ~100s worst case within Lambda 5min timeout). Only P1001 / connection refused are retried.
  • Optimize connection parameters: connection_limit=1 (Lambda handles one request per instance), connect_timeout=30 (accommodates auto-pause resume time).

Changes

  • webapp/src/lib/prisma.ts — Remove console.log, remove verbose log option, add retry extension via $extends
  • webapp/src/jobs/migration-runner.ts — Extract runPrismaDbPush with retry loop, structured logging
  • cdk/lib/constructs/database.ts — Change connection options to ?connection_limit=1&connect_timeout=30

Verification

  • console.log(process.env.DATABASE_URL) is removed
  • After Aurora auto-pause resume, the first request recovers via retry
  • Non-retryable errors (e.g. auth failure) are thrown immediately without retry
  • cdk deploy succeeds even when Aurora is resuming from 0 ACU
  • tsc --noEmit passes
  • prettier --check passes

…, #105)

Why: Aurora Serverless v2 with auto-pause (0 ACU) drops connections on
idle_session_timeout and takes ~15s to resume. Without retry, both
runtime queries and CDK deployment migrations fail on transient errors.
Also, DATABASE_URL (including password) was logged to CloudWatch.

What:
- Remove console.log(DATABASE_URL) that leaked credentials to CloudWatch
- Add Prisma client extension with retry on transient connection errors
  (P2024, P1001, P1017, idle-session timeout, ECONNRESET)
- Add exponential backoff retry to migration-runner for prisma db push
- Optimize connection params: connection_limit=1, connect_timeout=30
The default pool_timeout (10s) is insufficient for Aurora Serverless v2
auto-pause resume (~15s). Also, PrismaClientInitializationError for pool
timeout has errorCode=undefined, so message-based detection is needed.
@konokenj konokenj force-pushed the feature/prisma-retry branch from d94e77e to 908ab82 Compare March 20, 2026 04:17
@konokenj konokenj requested a review from tmokmss March 20, 2026 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant