Skip to content

parity: full-surface audit fixes (D resource-leaks → then B/A/C stacked)#2226

Merged
agbishop merged 7 commits into
mainfrom
parity/resource-leak-fixes
Jun 10, 2026
Merged

parity: full-surface audit fixes (D resource-leaks → then B/A/C stacked)#2226
agbishop merged 7 commits into
mainfrom
parity/resource-leak-fixes

Conversation

@agbishop

Copy link
Copy Markdown
Collaborator

Implements the fixes catalogued in the #2225 full-surface parity audit. ONE PR, fixes stacked by bucket to avoid cross-service merge conflicts (audit buckets share services like DynamoDB/STS/SSM/CloudWatch).

Bucket D — Resource leaks (DONE): 9 genuine leaks fixed (others in the audit were already fixed on current main): CloudWatch alarmHistory eviction on DeleteAlarms; Route53 healthcheck tag release; Pinpoint per-app state purge + event cap; MWAA metrics slice GC; AppRunner per-service ops cap; STS expired-session eviction; DynamoDB ShardIteratorStore write-path sweep; SSM command expiry on write; ECR layer-upload TTL. -race clean, table-driven release tests, no //nolint.

Coming on this same branch (stacked): B (incorrect emulation), A (stub ops), C (perf). Will update this description as each lands.

Do not merge yet — more buckets stacking.

Address the confirmed "Resource leaks" findings from the full-surface
parity audit. Each fix releases state on the resource's delete path or
bounds growth with a cap/lazy-eviction so the optional janitor is not the
only thing preventing unbounded growth.

- cloudwatch: DeleteAlarms now drops the per-alarm history (alarmHistory
  was unbounded across deleted alarms).
- route53: DeleteHealthCheck releases its handler-level tags (only
  hosted-zone deletes were cleaning up before).
- pinpoint: DeleteApp purges all per-app state (events, endpoints,
  channels, campaign/segment versions, activities, journey runs,
  campaigns/segments/journeys); PutEvents caps retained events per app.
- mwaa: PublishMetrics trims into a right-sized slice so the dropped
  prefix is GC'd instead of pinned by an oversized backing array.
- apprunner: per-service operation history is capped (was append-only).
- sts: AssumeRole/etc. opportunistically evict expired sessions above a
  threshold so b.sessions stays bounded without the janitor.
- dynamodb: ShardIteratorStore.Put sweeps expired iterators once it grows
  past a threshold.
- ssm: SendCommand prunes aged-out commands/invocations on the write path.
- ecr: layer uploads carry a CreatedAt and abandoned ones are pruned on
  InitiateLayerUpload; UploadLayerPart refreshes activity.

Adds table-driven tests per service proving the resource is released
(map size returns to baseline after delete; caps hold; expired entries
evicted) with -race where goroutines/timers are involved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@agbishop

agbishop commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

📊 Code Coverage Report

Metric Value Status
Total Coverage 0.0%
83.2%
New Code Coverage N/A (0/0 stmts)

Tip

This project maintains a minimum coverage threshold of 85%. Maintain or improve coverage on new code to ensure long-term stability.


Last updated: Wed, 10 Jun 2026 22:36:24 GMT

mayor and others added 6 commits June 10, 2026 15:36
Address still-broken "Incorrect / missing AWS emulation" findings from the
full-surface parity audit (bucket B), stacked on the bucket-D leak fixes:

- STS: GetCallerIdentity token mismatch now returns 400 InvalidClientTokenId
  (ErrUnknownAccessKeyID) instead of 403 AccessDenied.
- DynamoDB: reject ConsistentRead=true on a GSI (ValidationException); reject
  duplicate keys within a table's BatchGetItem Keys list.
- S3: CompleteMultipartUpload rejects an empty parts list (InvalidRequest).
- Bedrock: CreateProvisionedModelThroughput enforces an upper modelUnits bound.
- MediaConvert: deepClone no longer silently truncates deeply-nested settings.
- Elasticsearch: AssociatePackage returns ConflictException on duplicate.
- Account: ListRegions honours maxResults/nextToken (opaque cursor, sorted).
- Glacier: retrieval jobs start InProgress and complete after a simulated
  async window instead of being Succeeded at creation.
- API Gateway v2: GetModelTemplate returns the model schema, not {}.
- RDS: DescribeDBParameterGroups/DescribeDBParameters/DescribeOptionGroups/
  DescribeDBClusterParameterGroups now paginate with Marker/MaxRecords.
- DirectoryService: unknown operations return 400 InvalidRequestException
  instead of 501.

Already-correct on the current branch (verified, no change): EventBridge DLQ +
malformed-pattern validation, Kinesis GetRecords/ListStreamConsumers/CreateStream,
DynamoDB UpdateTable 20-GSI ceiling, Neptune ServerlessV2 config, SNS
RedrivePolicy propagation, SecurityHub Id/ProductArn validation.

Adds table-driven tests asserting the corrected behaviour for each fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implement genuinely-stubbed operations identified in the bucket-A audit,
backed by real InMemoryBackend state with AWS-accurate shapes/errors:

- EC2 RevokeSecurityGroupEgress: validate group, remove matching egress
  rules, idempotent on absent rules (was a no-op).
- CloudTrail LookupEvents: record events; honor time range, lookup
  attributes (ANDed), ordering, MaxResults and NextToken pagination.
- CodePipeline ListActionExecutions: track action executions per pipeline
  run with pipelineExecutionId filter; ListRuleTypes returns the AWS
  managed rule catalog.
- ApplicationAutoScaling DescribeScalingActivities: record and return
  scaling activities on RegisterScalableTarget.

Most other bucket-A items were verified already-implemented on current
code (older audit snapshot). Remaining genuine gaps (AppSync EvaluateCode,
Kafka Update* setting payloads, CloudFront long-tail stubs, Pipes silent
skip) are documented in parity.md for follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…arch/forecast/ecs)

Internal hot-path optimizations from the full-surface parity audit (bucket C),
no observable behavior change:

- eventbridge: deliverEvents no longer deep-copies all bus rules/index/targets
  per PutEvents; build a scoped delivery plan under the read lock that snapshots
  only matched rules' targets for the referenced buses.
- kms: add grantsByToken index so findGrantByToken is O(1) on the
  encrypt/decrypt grant-validation path instead of scanning all grants; index
  kept consistent on create/revoke/retire/reset.
- opensearch: add domainPackages reverse index so ListPackagesForDomain and the
  DeleteDomain cascade no longer scan every package's association set.
- forecast: lookupLocked resolves by name (map key) or by reversing the
  deterministic ARN to the name, replacing the per-call O(n) scan.
- ecs: preallocate the reconciler service snapshot to the exact service count.

Adds focused index-consistency tests for the kms grant-token index and the
opensearch domain->packages reverse index.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pes)

Resolve the four DEFERRED bucket-A items documented in parity.md:

- AppSync EvaluateCode: replace hardcoded {"evaluationResult":"{}"} with a
  faithful APPSYNC_JS evaluator (no JS engine vendored). Selects request/
  response handler, evaluates object/JSON literals, ctx.* member paths, and
  pure util.* helpers; unsupported constructs return ErrUnsupportedJSCode
  rather than a fabricated result. New services/appsync/jseval.go + tests.
- Kafka Update{Connectivity,Monitoring,Security,Storage}: parse request
  bodies, persist setting payloads onto the cluster, and record a
  ClusterOperation with SourceClusterInfo/TargetClusterInfo (new
  MutableClusterInfo) surfaced by DescribeClusterOperation. UpdateRebalancing
  is action-only (no AWS-exposed setting). Tests added.
- EventBridge Pipes: stop silently dropping events. Unwired enrichment/target
  invokers now return real errors; failed batches route to the configured
  DLQ (SQS/SNS) and delete source messages, else retain for redelivery.
- CloudFront long-tail: re-audited — already backed by real InMemoryBackend
  state on this branch; stale deferral note corrected in parity.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The bucket-B parity fix made Glacier retrieval jobs AWS-accurate: jobs
now start InProgress and promote to Succeeded only after the simulated
retrievalDelay window. The integration test still assumed synchronous
completion. Poll DescribeJob until StatusCode=Succeeded/Completed before
asserting and calling GetJobOutput.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@agbishop agbishop enabled auto-merge (squash) June 10, 2026 22:19
@agbishop agbishop merged commit 8169754 into main Jun 10, 2026
26 checks passed
@agbishop agbishop deleted the parity/resource-leak-fixes branch June 10, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant