feat(server): New TTL system, enforce max queue length limits, lazy waitpoint creation by ericallam · Pull Request #2980 · triggerdotdev/trigger.dev

ericallam · 2026-01-30T19:02:06Z

Summary

This PR implements environment-level queue size limits to prevent runaway queues from overwhelming Redis, along with several related improvements.

Environment Queue Limits

Prevents any single environment from overwhelming the shared Redis cluster by enforcing queue size limits at trigger time.

Added MAXIMUM_ENV_QUEUE_SIZE environment variable with sensible defaults
Queue size checks happen at trigger time - runs are rejected if queue would exceed limit
Dashboard UI shows queue limits on both the Queues page and a new Limits page
In-memory caching for queue size checks to reduce Redis load

Fix: batchTriggerAndWait stuck when items skipped

Fixed a bug where batchTriggerAndWait would get stuck forever when some items were skipped due to queue limits.

Root cause: When items were skipped during batch streaming:

Phase 1 (createBatch) returned original runCount to SDK
Phase 2 (streamBatchItems) skipped items due to limits, updated DB with new count
SDK still waited for original count but fewer runs existed
Parent run stuck forever

Fix: Server now returns actual runCount in Phase 2 response, SDK uses this for waitForBatch.

New TTL System

Improved TTL (time-to-live) handling for run expiration with better test coverage.

Lazy Waitpoint Creation

Better idempotency handling through lazy waitpoint creation - waitpoints are now created on-demand rather than eagerly.

Run Queue Improvements

Various improvements to the run queue system with expanded test coverage.

Test plan

Test triggering runs when queue is at/near limit
Test batchTriggerAndWait with queue limits causing skipped items
Verify dashboard displays limits correctly
Run new TTL and waitpoint tests

refs TRI-7167

🤖 Generated with Claude Code

changeset-bot · 2026-01-30T19:02:16Z

⚠️ No Changeset found

Latest commit: 814a9a4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-01-30T19:02:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Centralizes queue-size logic (new v3/queueLimits utility and environment queueSizeLimit exposure) and adds an LRU cache for environment queue lengths. Refactors queue validation to per-queue semantics (resolveQueueNamesForBatchItems, validateMultipleQueueLimits) and surfaces itemsSkipped/runCount through batch streaming APIs. Introduces per-item retry for batch queue processing, batch-run-count updates, and a TriggerFailedTaskService for creating pre-failed runs. Adds a TTL expiration subsystem (batched TTL consumers, Redis TTL scripts, ttlSystem callback) and lazy get-or-create waitpoints with related waitpoint APIs. Numerous RunEngine/RunQueue/BatchQueue public API additions and tests updated; UI presenters and routes updated to use the single queueSize quota.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~180 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is comprehensive, covering the objectives, changes, and test plan. However, it is missing the required checklist items and structured sections from the template.	Add the required checklist section with checkboxes for contributing guide, PR title convention, and code testing. Include Testing and Changelog sections as specified in the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately summarizes the main changes: new TTL system, queue size limit enforcement, and lazy waitpoint creation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ea-branch-117

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vibe-kanban-cloud · 2026-01-30T19:06:05Z

Review Complete

Your review story is ready!

View Story

Comment !reviewfast on this PR to re-generate the story.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@apps/webapp/app/presenters/v3/LimitsPresenter.server.ts`:
- Around line 170-194: The code silently treats a missing runtime environment as
a zero queue by checking runtimeEnv and continuing; change this to explicitly
handle the case where runtimeEnv is null for the required environmentId by
returning/throwing an error (or returning a clear "not found" result) instead of
falling through to the deployed default logic—locate the block where runtimeEnv
is fetched (the runtimeEnvironment.findFirst call and the subsequent use of
runtimeEnv, currentQueueSize and engine.lengthOfEnvQueue) and add an explicit
branch that raises a descriptive error or returns an appropriate 404-style
response when runtimeEnv is falsy so callers don’t get misleading limits
computed for a non-existent environment.

🧹 Nitpick comments (1)

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)
311-329: Queue size limit logic is duplicated with EnvironmentQueuePresenter.

The limit computation logic (lines 314-317) and source determination (lines 321-328) duplicate the same pattern found in EnvironmentQueuePresenter.server.ts (lines 44-47). Consider extracting this into a shared utility function to ensure consistency and reduce maintenance burden.
♻️ Example shared utility
// In a shared utils file, e.g., ~/utils/queueSizeLimit.ts
export function getQueueSizeLimitForEnvironment(
  environmentType: string,
  organization: { maximumDevQueueSize: number | null; maximumDeployedQueueSize: number | null },
  defaultDevLimit: number | null,
  defaultDeployedLimit: number | null
): { limit: number | null; source: "plan" | "default" } {
  if (environmentType === "DEVELOPMENT") {
    return {
      limit: organization.maximumDevQueueSize ?? defaultDevLimit ?? null,
      source: organization.maximumDevQueueSize ? "plan" : "default",
    };
  }
  return {
    limit: organization.maximumDeployedQueueSize ?? defaultDeployedLimit ?? null,
    source: organization.maximumDeployedQueueSize ? "plan" : "default",
  };
}

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 279102c and a68ef30.

📒 Files selected for processing (4)

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx

🧰 Additional context used

📓 Path-based instructions (7)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Access environment variables via env export from apps/webapp/app/env.server.ts, never use process.env directly

Files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

🧠 Learnings (10)

📚 Learning: 2025-12-08T15:19:56.823Z

Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2760
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx:278-281
Timestamp: 2025-12-08T15:19:56.823Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx, the tableState search parameter uses intentional double-encoding: the parameter value contains a URL-encoded URLSearchParams string, so decodeURIComponent(value("tableState") ?? "") is required to fully decode it before parsing with new URLSearchParams(). This pattern allows bundling multiple filter/pagination params as a single search parameter.

Applied to files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

Applied to files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/v3/presenters/**/*.server.{ts,tsx} : Organize presenters in the webapp following the pattern `app/v3/presenters/*/*.server.ts` to move complex loader code into classes

Applied to files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx

📚 Learning: 2026-01-15T11:50:06.067Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : Access environment variables via `env` export from `apps/webapp/app/env.server.ts`, never use `process.env` directly

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts

📚 Learning: 2026-01-15T11:50:06.067Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to apps/webapp/**/*.test.{ts,tsx} : For testable code in the webapp, never import env.server.ts in test files - pass configuration as options instead

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/**/*.test.{ts,tsx} : Test files should only import classes and functions from `app/**/*.ts` files and should not import `env.server.ts` directly or indirectly; pass configuration through options instead

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts

📚 Learning: 2025-11-14T16:03:06.917Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2026-01-12T17:18:09.451Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.

Applied to files:

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine

Applied to files:

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

🧬 Code graph analysis (2)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx (2)

apps/webapp/app/utils/numberFormatter.ts (1)

formatNumberCompact (3-5)

apps/webapp/app/components/primitives/Tooltip.tsx (1)

InfoIconTooltip (110-139)

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)

apps/webapp/app/v3/runEngine.server.ts (1)

engine (11-11)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: typecheck / typecheck

🔇 Additional comments (5)

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts (1)

1-56: LGTM!

The implementation correctly extends the Environment type with queueSizeLimit and computes it based on environment type with appropriate fallback chain (org override → env var → null). Environment variable access follows coding guidelines via ~/env.server.

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)

510-511: LGTM!

Clean consolidation from separate dev/deployed queue size quotas to a unified queueSize quota, conditionally rendered only when a limit is set. This aligns well with the LimitsPresenter changes.

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx (2)

349-391: LGTM!

The queue size limit display with color-coded usage indicators is well-implemented. The conditional rendering shows the limit suffix when queueSizeLimit exists, with appropriate fallback to the paused indicator.

Minor note: The component has env from useEnvironment() (line 313) shadowing the env import from ~/env.server (line 57). This works correctly since the loader uses the imported env and the component uses the hook result, but renaming one variable (e.g., runtimeEnv for the hook result) could improve clarity.

1145-1152: LGTM!

Clean helper function with consistent threshold logic (90% warning, 100% error). Returning undefined when no limit is set allows proper fallback behavior in the calling code.

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)

15-15: LGTM on the structural changes.

The import of engine and the updated LimitsResult type with consolidated queueSize: QuotaInfo properly support the new unified queue size quota feature.

Also applies to: 70-70

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@apps/webapp/app/env.server.ts`:
- Around line 536-539: The MAXIMUM_DEV_QUEUE_SIZE default of 500 introduces an
unintended hard cap for dev environments because guardQueueSizeLimitsForEnv
reads this value and enforces it; remove the .default(500) so
MAXIMUM_DEV_QUEUE_SIZE remains optional/undefined unless explicitly set in the
environment, leaving the z.coerce.number().int().optional() schema for
MAXIMUM_DEV_QUEUE_SIZE and ensure any code calling guardQueueSizeLimitsForEnv
continues to treat undefined as "no cap" (verify guardQueueSizeLimitsForEnv
behavior and update it only if it currently treats undefined incorrectly).

In `@apps/webapp/app/presenters/v3/LimitsPresenter.server.ts`:
- Around line 313-319: The queue-size quota object (queueSize) in
LimitsPresenter.server.ts is missing the isUpgradable flag so the UI never shows
an Upgrade action; update the queueSize payload returned by the presenter to
include isUpgradable: true for plan-based limits (and add canExceed: true|false
as appropriate), e.g., set isUpgradable to true when
getQueueSizeLimitSource(environmentType, organization) indicates a plan-based
source and ensure the UI-facing fields (queueSize.currentUsage, queueSize.limit,
queueSize.source) remain unchanged.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0f94ff and a50a5f5.

📒 Files selected for processing (7)

apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx

🧰 Additional context used

📓 Path-based instructions (7)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Access environment variables via env export from apps/webapp/app/env.server.ts, never use process.env directly

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

🧠 Learnings (8)

📓 Common learnings

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts

📚 Learning: 2025-11-14T16:03:06.917Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.

Applied to files:

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-12-08T15:19:56.823Z

Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2760
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx:278-281
Timestamp: 2025-12-08T15:19:56.823Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx, the tableState search parameter uses intentional double-encoding: the parameter value contains a URL-encoded URLSearchParams string, so decodeURIComponent(value("tableState") ?? "") is required to fully decode it before parsing with new URLSearchParams(). This pattern allows bundling multiple filter/pagination params as a single search parameter.

Applied to files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties

Applied to files:

apps/webapp/app/runEngine/concerns/queues.server.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Specify runtime environment (node or bun) in trigger.config.ts using the `runtime` property

Applied to files:

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2026-01-12T17:18:09.451Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.

Applied to files:

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

🧬 Code graph analysis (2)

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts (1)

apps/webapp/app/v3/utils/queueLimits.server.ts (1)

getQueueSizeLimit (24-33)

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (2)

apps/webapp/app/v3/runEngine.server.ts (1)

engine (11-11)

apps/webapp/app/v3/utils/queueLimits.server.ts (2)

getQueueSizeLimit (24-33)

getQueueSizeLimitSource (42-51)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: typecheck / typecheck

🔇 Additional comments (7)

apps/webapp/app/runEngine/concerns/queues.server.ts (3)

18-33: LRU cache setup looks solid.

The singleton + bounded LRU store should help reduce Redis churn without unbounded memory growth.

298-317: Confirm stale cache is acceptable for enforcement.

guardQueueSizeLimitsForEnv now relies on an SWR cache; with a ~30s TTL, queue size can be undercounted and allow bursts above the limit. If this is a hard/billing limit, consider bypassing the cache for enforcement (or forcing a fresh read when nearing the limit).

68-68: No issues to flag in these edits.

Also applies to: 94-96, 116-118, 130-131

apps/webapp/app/v3/utils/queueLimits.server.ts (1)

7-51: Queue-limit helpers look good.

Clear resolution order and source labeling; the API surface is minimal and consistent.

apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts (1)

4-52: queueSizeLimit wiring looks consistent.

The new field is derived in one place and keeps the presenter output cohesive.

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)

2-18: API/type expansion for queue limits looks consistent.

The new environmentType parameter and consolidated quota surface align with the updated limit logic.

Also applies to: 58-96

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)

81-86: Loader and quota table updates look good.

Passing environmentType and gating the queue-size quota on a defined limit is consistent with the new presenter shape.

Also applies to: 511-513

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-31T09:11:06Z

apps/webapp/app/env.server.ts

+    MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional().default(500),
    MAXIMUM_DEPLOYED_QUEUE_SIZE: z.coerce.number().int().optional(),
+    QUEUE_SIZE_CACHE_TTL_MS: z.coerce.number().int().optional().default(30_000), // 30 seconds
+    QUEUE_SIZE_CACHE_MAX_SIZE: z.coerce.number().int().optional().default(5_000),


⚠️ Potential issue | 🟠 Major

Defaulting MAXIMUM_DEV_QUEUE_SIZE changes enforcement behavior.

This turns previously-unlimited dev environments into a hard 500-queue cap (via guardQueueSizeLimitsForEnv). If that’s not intentional, remove the default and require an explicit env var to enable the limit.

💡 Suggested change (avoid unintended hard limit)

- MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional().default(500), + MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional(),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional().default(500),

MAXIMUM_DEPLOYED_QUEUE_SIZE: z.coerce.number().int().optional(),

QUEUE_SIZE_CACHE_TTL_MS: z.coerce.number().int().optional().default(30_000), // 30 seconds

QUEUE_SIZE_CACHE_MAX_SIZE: z.coerce.number().int().optional().default(5_000),

MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional(),

MAXIMUM_DEPLOYED_QUEUE_SIZE: z.coerce.number().int().optional(),

QUEUE_SIZE_CACHE_TTL_MS: z.coerce.number().int().optional().default(30_000), // 30 seconds

QUEUE_SIZE_CACHE_MAX_SIZE: z.coerce.number().int().optional().default(5_000),

🤖 Prompt for AI Agents

In `@apps/webapp/app/env.server.ts` around lines 536 - 539, The MAXIMUM_DEV_QUEUE_SIZE default of 500 introduces an unintended hard cap for dev environments because guardQueueSizeLimitsForEnv reads this value and enforces it; remove the .default(500) so MAXIMUM_DEV_QUEUE_SIZE remains optional/undefined unless explicitly set in the environment, leaving the z.coerce.number().int().optional() schema for MAXIMUM_DEV_QUEUE_SIZE and ensure any code calling guardQueueSizeLimitsForEnv continues to treat undefined as "no cap" (verify guardQueueSizeLimitsForEnv behavior and update it only if it currently treats undefined incorrectly).

coderabbitai · 2026-01-31T09:11:06Z

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

+        queueSize: {
+          name: "Max queued runs",
+          description: "Maximum pending runs across all queues in this environment",
+          limit: getQueueSizeLimit(environmentType, organization),
+          currentUsage: currentQueueSize,
+          source: getQueueSizeLimitSource(environmentType, organization),
        },


⚠️ Potential issue | 🟡 Minor

Queue-size quota never shows an upgrade action.

isUpgradable is omitted, so the Upgrade column is empty even for plan-based limits. If upgrades should be offered, set it to true (and optionally canExceed).

💡 Suggested change

queueSize: { name: "Max queued runs", description: "Maximum pending runs across all queues in this environment", limit: getQueueSizeLimit(environmentType, organization), currentUsage: currentQueueSize, source: getQueueSizeLimitSource(environmentType, organization), + isUpgradable: true, },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

queueSize: {

name: "Max queued runs",

description: "Maximum pending runs across all queues in this environment",

limit: getQueueSizeLimit(environmentType, organization),

currentUsage: currentQueueSize,

source: getQueueSizeLimitSource(environmentType, organization),

},

queueSize: {

name: "Max queued runs",

description: "Maximum pending runs across all queues in this environment",

limit: getQueueSizeLimit(environmentType, organization),

currentUsage: currentQueueSize,

source: getQueueSizeLimitSource(environmentType, organization),

isUpgradable: true,

},

🤖 Prompt for AI Agents

In `@apps/webapp/app/presenters/v3/LimitsPresenter.server.ts` around lines 313 - 319, The queue-size quota object (queueSize) in LimitsPresenter.server.ts is missing the isUpgradable flag so the UI never shows an Upgrade action; update the queueSize payload returned by the presenter to include isUpgradable: true for plan-based limits (and add canExceed: true|false as appropriate), e.g., set isUpgradable to true when getQueueSizeLimitSource(environmentType, organization) indicates a plan-based source and ensure the UI-facing fields (queueSize.currentUsage, queueSize.limit, queueSize.source) remain unchanged.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

internal-packages/run-engine/src/engine/tests/ttl.test.ts (1)
1-9: ⚠️ Potential issue | 🟡 Minor

Missing vi import from vitest.

Similar to lazyWaitpoint.test.ts, the vi global is used but not imported.
🔧 Proposed fix
 import { containerTest, assertNonNullable } from "@internal/testcontainers";
 import { trace } from "@internal/tracing";
-import { expect } from "vitest";
+import { expect, vi } from "vitest";
 import { RunEngine } from "../index.js";
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)

5-5: ⚠️ Potential issue | 🟡 Minor

Use subpath export for tryCatch from @trigger.dev/core/utils.

Per coding guidelines, imports from @trigger.dev/core in the webapp must use subpath exports from the package.json. Change import { tryCatch } from "@trigger.dev/core"; to import { tryCatch } from "@trigger.dev/core/utils";.

🤖 Fix all issues with AI agents

In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts`:
- Around line 112-130: The updateRunCount method currently trusts newRunCount;
add input validation in updateRunCount (in completionTracker.ts) to reject
negative, NaN, non-finite or non-integer values before calling
getMeta/storeMeta: use Number.isInteger(newRunCount) && newRunCount >= 0 (and
optionally Number.isFinite) and if the check fails log an error via
this.logger.error with batchId and the invalid value and then throw an Error or
return a rejected Promise so the invalid value is never persisted; keep existing
behavior for meta retrieval (getMeta) and storage (storeMeta) otherwise.

In `@internal-packages/run-engine/src/batch-queue/index.ts`:
- Around line 380-386: The updateRunCount method can lower runCount below the
already-processed count and leave a batch stuck; before calling
this.completionTracker.updateRunCount(batchId, newRunCount) clamp newRunCount to
at least the current processed count (e.g. newRunCount = Math.max(newRunCount,
this.completionTracker.getProcessedCount(batchId))) or, alternatively, after
updating call into the completion tracker to run the completion/finalization
check (e.g. this.completionTracker.checkCompletion/finalizeBatch for batchId) so
that if processedCount >= runCount the batch is finalized immediately; update
updateRunCount to perform one of these fixes using the existing
completionTracker APIs.

In `@internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts`:
- Around line 1-8: The test uses the vitest runner's mock/time API via
vi.setConfig but never imports vi; update the imports to explicitly import vi
from 'vitest' (for example change the existing import of expect to import {
expect, vi } from "vitest") so the vi symbol used by vi.setConfig is defined and
the test runner APIs are available.

In `@internal-packages/run-engine/src/run-queue/index.ts`:
- Around line 1238-1304: The code currently calls `#expireTtlRuns`(shard, now,
batchSize) which deletes/marks runs as expired before invoking
options.ttlSystem!.callback(expiredRuns), so callback failures only get logged
and those runs can be lost; update `#startTtlConsumer` (or `#expireTtlRuns`) so that
expiration is durable: either (A) defer permanent deletion/marking of runs until
after the callback completes successfully (move the delete/mark logic to after
options.ttlSystem!.callback returns) or (B) if deletion must happen first, add a
durable retry/enqueue path when the callback throws — e.g., push failed expired
run IDs to a retry store (Redis list/sorted set) with metadata and backoff, and
spawn a retry worker to reprocess them; ensure the callback and retry path are
idempotent and update logging to include run IDs and shard when queuing retries.
- Around line 2561-2637: expireTtlRuns currently removes run entries from
per-queue structures but never updates the master queue (masterQueueKey),
leaving stale entries; update the logic so that either (A) the Lua script also
computes the corresponding master/shard key from queueKey and ZREM the runId
from masterQueueKey (or adjust scores) for each expired member, or (B) return
the affected queueKey values (not just member) from expireTtlRuns so the
TypeScript caller can call the existing rebalance/cleanup routine for
masterQueueKey per returned queue; locate expireTtlRuns, the queueKey local
variable and the final results/return section to implement one of these fixes.

🧹 Nitpick comments (7)

packages/core/src/v3/schemas/api.ts (1)

412-424: Tighten validation for new count fields.
These are counts, so restricting them to non‑negative integers prevents accidental floats/negatives in API responses.
♻️ Suggested update
-  itemsSkipped: z.number().optional(),
+  itemsSkipped: z.number().int().nonnegative().optional(),
...
-  runCount: z.number().optional(),
+  runCount: z.number().int().nonnegative().optional(),

apps/webapp/app/runEngine/concerns/queues.server.ts (1)

421-432: Consider potential cache stampede scenario.

The SWR pattern handles stale reads well, but when the cache is completely cold (e.g., after deployment), multiple concurrent requests for the same queue could all trigger Redis lookups simultaneously before the first one populates the cache.

For most use cases this is acceptable, but if queue size lookups become a bottleneck, consider adding a coalescing mechanism.

apps/webapp/app/runEngine/services/triggerTask.server.ts (1)

258-269: Queue validation correctly uses resolved queue name.

The reordering to validate after resolveQueueProperties is correct since the queue name is needed for per-queue validation.

Consider enhancing the error message to include the queue name for better debuggability:
💡 Optional: Include queue name in error message
         if (!queueSizeGuard.ok) {
           throw new ServiceValidationError(
-            `Cannot trigger ${taskId} as the queue size limit for this environment has been reached. The maximum size is ${queueSizeGuard.maximumSize}`
+            `Cannot trigger ${taskId} as the queue size limit for queue '${queueName}' has been reached. The maximum size is ${queueSizeGuard.maximumSize}`
           );
         }

internal-packages/run-engine/src/run-queue/types.ts (1)

56-128: Convert RunQueueKeyProducer interface to a type alias to align with TypeScript guidelines.

♻️ Suggested refactor

-export interface RunQueueKeyProducer {
+export type RunQueueKeyProducer = {
   //queue
   queueKey(
     orgId: string,
     projId: string,
     envId: string,
     queue: string,
     concurrencyKey?: string
   ): string;
   queueKey(env: RunQueueKeyProducerEnvironment, queue: string, concurrencyKey?: string): string;

   legacyMasterQueueKey(masterQueueName: string): string;

   masterQueueKeyForEnvironment(envId: string, shardCount: number): string;
   masterQueueKeyForShard(shard: number): string;
   masterQueueShardForEnvironment(envId: string, shardCount: number): number;
   workerQueueKey(workerQueue: string): string;

   envQueueKey(env: RunQueueKeyProducerEnvironment): string;
   envQueueKeyFromQueue(queue: string): string;
   queueConcurrencyLimitKey(env: RunQueueKeyProducerEnvironment, queue: string): string;
   queueConcurrencyLimitKeyFromQueue(queue: string): string;
   queueCurrentConcurrencyKeyFromQueue(queue: string): string;
   queueCurrentConcurrencyKey(
     env: RunQueueKeyProducerEnvironment,
     queue: string,
     concurrencyKey?: string
   ): string;
   queueCurrentDequeuedKeyFromQueue(queue: string): string;
   queueCurrentDequeuedKey(
     env: RunQueueKeyProducerEnvironment,
     queue: string,
     concurrencyKey?: string
   ): string;

   //env oncurrency
   envCurrentConcurrencyKey(env: EnvDescriptor): string;
   envCurrentConcurrencyKey(env: RunQueueKeyProducerEnvironment): string;

   envConcurrencyLimitKey(env: EnvDescriptor): string;
   envConcurrencyLimitKey(env: RunQueueKeyProducerEnvironment): string;

   envCurrentDequeuedKey(env: EnvDescriptor): string;
   envCurrentDequeuedKey(env: RunQueueKeyProducerEnvironment): string;

   envConcurrencyLimitBurstFactorKey(env: EnvDescriptor): string;
   envConcurrencyLimitBurstFactorKey(env: RunQueueKeyProducerEnvironment): string;
   envConcurrencyLimitBurstFactorKeyFromQueue(queue: string): string;

   envConcurrencyLimitKeyFromQueue(queue: string): string;
   envCurrentConcurrencyKeyFromQueue(queue: string): string;
   envCurrentDequeuedKeyFromQueue(queue: string): string;

   //message payload
   messageKeyPrefixFromQueue(queue: string): string;
   messageKey(orgId: string, messageId: string): string;
   //utils
   orgIdFromQueue(queue: string): string;
   envIdFromQueue(queue: string): string;
   projectIdFromQueue(queue: string): string;
   descriptorFromQueue(queue: string): QueueDescriptor;

   deadLetterQueueKey(env: RunQueueKeyProducerEnvironment): string;
   deadLetterQueueKey(env: EnvDescriptor): string;
   deadLetterQueueKeyFromQueue(queue: string): string;

   // Concurrency sweeper methods
   markedForAckKey(): string;
   currentConcurrencySetKeyScanPattern(): string;

   // TTL system methods
   ttlQueueKeyForShard(shard: number): string;
-}
+};

internal-packages/run-engine/src/engine/systems/ttlSystem.ts (2)

252-260: Unnecessary type cast for TaskRunStatus.

The string literal "EXPIRED" is a valid TaskRunStatus value, so the as TaskRunStatus cast is redundant.
🧹 Minor cleanup
         await this.$.prisma.taskRun.updateMany({
           where: { id: { in: runIdsToExpire } },
           data: {
-            status: "EXPIRED" as TaskRunStatus,
+            status: "EXPIRED",
             completedAt: now,
             expiredAt: now,
           },
         });
263-321: Consider error handling for individual run processing within the batch.

If any single run fails during snapshot creation or waitpoint completion, Promise.all will reject and subsequent runs won't be processed. Since the batch updateMany on line 252 has already succeeded, this could leave some runs in a partially processed state (status=EXPIRED but no snapshot).

For improved resilience, consider using Promise.allSettled and collecting failures:
💡 Suggested approach
-        await Promise.all(
-          runsToExpire.map(async (run) => {
+        const results = await Promise.allSettled(
+          runsToExpire.map(async (run) => {
             // ... existing processing ...
             expired.push(run.id);
+            return run.id;
           })
         );
+
+        // Log any failures but don't throw - the status update already succeeded
+        for (const result of results) {
+          if (result.status === "rejected") {
+            this.$.logger.error("Failed to fully process expired run", {
+              error: result.reason,
+            });
+          }
+        }

internal-packages/run-engine/src/run-queue/index.ts (1)

108-110: Use a type alias for the callback.

This project prefers type over interface in TS; a function-shaped callback is a clean fit for a type alias.
♻️ Suggested change
-export interface TtlSystemCallback {
-  (runs: Array<{ queueKey: string; runId: string; orgId: string }>): Promise<void>;
-}
+export type TtlSystemCallback = (
+  runs: Array<{ queueKey: string; runId: string; orgId: string }>
+) => Promise<void>;
As per coding guidelines: Use types over interfaces for TypeScript.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a50a5f5 and aaea8d6.

⛔ Files ignored due to path filters (1)

references/hello-world/src/trigger/batches.ts is excluded by !references/**

📒 Files selected for processing (32)

apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/v3/runEngine.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
internal-packages/run-engine/src/batch-queue/completionTracker.ts
internal-packages/run-engine/src/batch-queue/index.ts
internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/types.ts
internal-packages/run-engine/src/run-queue/index.ts
internal-packages/run-engine/src/run-queue/keyProducer.ts
internal-packages/run-engine/src/run-queue/types.ts
packages/core/src/v3/schemas/api.ts
packages/trigger-sdk/src/v3/shared.ts

🚧 Files skipped from review as they are similar to previous changes (2)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts

🧰 Additional context used

📓 Path-based instructions (12)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

packages/core/src/v3/schemas/api.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/run-queue/types.ts
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/types.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
apps/webapp/app/v3/runEngine.server.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/run-queue/keyProducer.ts
apps/webapp/app/env.server.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
apps/webapp/app/runEngine/types.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
internal-packages/run-engine/src/run-queue/index.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
internal-packages/run-engine/src/engine/index.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

packages/core/src/v3/schemas/api.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/v3/runEngine.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

packages/core/src/v3/schemas/api.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/run-queue/types.ts
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/types.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
apps/webapp/app/v3/runEngine.server.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/run-queue/keyProducer.ts
apps/webapp/app/env.server.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
apps/webapp/app/runEngine/types.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
internal-packages/run-engine/src/run-queue/index.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
internal-packages/run-engine/src/engine/index.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

packages/core/src/v3/schemas/api.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/run-queue/types.ts
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/types.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
apps/webapp/app/v3/runEngine.server.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/run-queue/keyProducer.ts
apps/webapp/app/env.server.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
apps/webapp/app/runEngine/types.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
internal-packages/run-engine/src/run-queue/index.ts
internal-packages/run-engine/src/engine/index.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

packages/core/src/v3/schemas/api.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/run-queue/types.ts
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/types.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
apps/webapp/app/v3/runEngine.server.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/run-queue/keyProducer.ts
apps/webapp/app/env.server.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
apps/webapp/app/runEngine/types.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
internal-packages/run-engine/src/run-queue/index.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
internal-packages/run-engine/src/engine/index.ts

{packages,integrations}/**/*

📄 CodeRabbit inference engine (CLAUDE.md)

Add a changeset when modifying any public package in packages/* or integrations/* using pnpm run changeset:add

Files:

packages/core/src/v3/schemas/api.ts
packages/trigger-sdk/src/v3/shared.ts

apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/v3/runEngine.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Access environment variables via env export from apps/webapp/app/env.server.ts, never use process.env directly

Files:

apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/v3/runEngine.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts

**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptive describe and it blocks
Tests should avoid mocks or stubs and use the helpers from @internal/testcontainers when Redis or Postgres are needed
Use vitest for running unit tests

**/*.test.{ts,tsx,js,jsx}: Use vitest exclusively for testing and never mock anything - use testcontainers instead
Place test files next to source files with naming pattern: source file (e.g., MyService.ts) → MyService.test.ts

Files:

internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts

**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use testcontainers helpers (redisTest, postgresTest, containerTest) from @internal/testcontainers for Redis/PostgreSQL testing instead of mocks

Files:

internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts

packages/trigger-sdk/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Files:

packages/trigger-sdk/src/v3/shared.ts

🧠 Learnings (40)

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeyTTL` option to define a time window during which duplicate triggers return the original run

Applied to files:

internal-packages/run-engine/src/run-queue/types.ts
internal-packages/run-engine/src/engine/types.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/run-queue/keyProducer.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
internal-packages/run-engine/src/run-queue/index.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2026-01-12T17:18:09.451Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.

Applied to files:

internal-packages/run-engine/src/run-queue/types.ts
apps/webapp/app/v3/runEngine.server.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use metadata methods (set, del, replace, append, remove, increment, decrement, stream, flush) to update metadata during task execution

Applied to files:

internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch

Applied to files:

internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/batch-queue/index.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions

Applied to files:

internal-packages/run-engine/src/batch-queue/completionTracker.ts
packages/trigger-sdk/src/v3/shared.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `.withStreams()` to subscribe to realtime streams from task metadata in addition to run changes

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Subscribe to run updates using `runs.subscribeToRun()` for realtime monitoring of task execution

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

Applied to files:

packages/trigger-sdk/src/v3/shared.ts
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-14T16:03:06.917Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.

Applied to files:

apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

Applied to files:

apps/webapp/app/v3/utils/queueLimits.server.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/concerns/queues.server.ts
internal-packages/run-engine/src/run-queue/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties

Applied to files:

internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts
apps/webapp/app/runEngine/concerns/queues.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.triggerAndWait()` to trigger a task and wait for its result from a parent task

Applied to files:

internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:44.496Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/executing-commands.mdc:0-0
Timestamp: 2025-11-27T16:26:44.496Z
Learning: For running tests, navigate into the package directory and run `pnpm run test --run` to enable single-file test execution (e.g., `pnpm run test ./src/engine/tests/ttl.test.ts --run`)

Applied to files:

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
internal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to **/*.{test,spec}.{ts,tsx} : Use vitest for all tests in the Trigger.dev repository

Applied to files:

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts

📚 Learning: 2026-01-15T11:50:06.067Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to **/*.{ts,tsx} : Every Trigger.dev task must be exported and have a unique `id` property with no timeouts in the run function

Applied to files:

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-10-08T11:48:12.327Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts

📚 Learning: 2025-07-12T18:06:04.133Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/tests/ttl.test.ts
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTriggerAndWait()` to batch trigger tasks and wait for all results from a parent task

Applied to files:

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts
apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.trigger()` with type-only imports to trigger tasks from backend code without importing the task implementation

Applied to files:

internal-packages/run-engine/src/engine/tests/trigger.test.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-08-14T18:35:44.370Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: apps/webapp/app/env.server.ts:764-765
Timestamp: 2025-08-14T18:35:44.370Z
Learning: The BoolEnv helper in apps/webapp/app/utils/boolEnv.ts uses z.preprocess with inconsistent default value types across the codebase - some usages pass boolean defaults (correct) while others pass string defaults (incorrect), leading to type confusion. The helper should enforce boolean-only defaults or have clearer documentation.

Applied to files:

apps/webapp/app/env.server.ts

📚 Learning: 2025-08-19T09:49:07.011Z

Learnt from: julienvanbeveren
Repo: triggerdotdev/trigger.dev PR: 2417
File: apps/webapp/app/routes/api.v1.projects.$projectRef.envvars.$slug.import.ts:56-61
Timestamp: 2025-08-19T09:49:07.011Z
Learning: In the Trigger.dev codebase, environment variables should default to `isSecret: false` when not explicitly marked as secrets in the syncEnvVars functionality. This is the intended behavior for both regular variables and parent variables.

Applied to files:

apps/webapp/app/env.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Scope idempotency keys globally or to current run using the scope parameter

Applied to files:

apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeys.create()` to create idempotency keys for preventing duplicate task executions

Applied to files:

apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Export tasks with unique IDs within the project to enable proper task discovery and execution

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled/cron tasks instead of regular `task()`

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk/v3` with Zod schema for payload validation

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Specify runtime environment (node or bun) in trigger.config.ts using the `runtime` property

Applied to files:

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts

📚 Learning: 2025-12-18T14:09:10.154Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2794
File: internal-packages/run-engine/src/engine/systems/debounceSystem.ts:390-397
Timestamp: 2025-12-18T14:09:10.154Z
Learning: In the debounce system (internal-packages/run-engine/src/engine/systems/debounceSystem.ts), millisecond delays are not supported. The minimum debounce delay is 1 second (1s). The parseNaturalLanguageDuration function supports w/d/hr/h/m/s units only.

Applied to files:

internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Limit task duration using the `maxDuration` property (in seconds)

Applied to files:

internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
internal-packages/run-engine/src/engine/systems/ttlSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use logger methods (debug, log, info, warn, error) from `trigger.dev/sdk/v3` for structured logging in tasks

Applied to files:

internal-packages/run-engine/src/engine/systems/enqueueSystem.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets

Applied to files:

apps/webapp/app/runEngine/types.ts
apps/webapp/app/runEngine/services/batchTrigger.server.ts

📚 Learning: 2026-02-03T18:27:40.429Z

Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2994
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.environment-variables/route.tsx:553-555
Timestamp: 2026-02-03T18:27:40.429Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.environment-variables/route.tsx, the menu buttons (e.g., Edit with PencilSquareIcon) in the TableCellMenu are intentionally icon-only with no text labels as a compact UI pattern. This is a deliberate design choice for this route; preserve the icon-only behavior for consistency in this file.

Applied to files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

📚 Learning: 2025-12-08T15:19:56.823Z

Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2760
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx:278-281
Timestamp: 2025-12-08T15:19:56.823Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx, the tableState search parameter uses intentional double-encoding: the parameter value contains a URL-encoded URLSearchParams string, so decodeURIComponent(value("tableState") ?? "") is required to fully decode it before parsing with new URLSearchParams(). This pattern allows bundling multiple filter/pagination params as a single search parameter.

Applied to files:

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine

Applied to files:

internal-packages/run-engine/src/engine/index.ts

🧬 Code graph analysis (15)

apps/webapp/app/v3/utils/queueLimits.server.ts (1)

apps/webapp/app/env.server.ts (1)

env (1332-1332)

internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)

internal-packages/run-engine/src/engine/statuses.ts (1)

isFinalRunStatus (55-57)

apps/webapp/app/v3/runEngine.server.ts (2)

apps/webapp/app/env.server.ts (1)

env (1332-1332)

apps/supervisor/src/env.ts (1)

env (144-144)

internal-packages/run-engine/src/run-queue/keyProducer.ts (1)

internal-packages/run-engine/src/run-queue/index.ts (4)

shard (1170-1236)

shard (1258-1318)

shard (1324-1350)

shard (1420-1545)

apps/webapp/app/env.server.ts (2)

apps/webapp/app/utils/boolEnv.ts (1)

BoolEnv (12-14)

apps/supervisor/src/envUtil.ts (1)

BoolEnv (15-17)

apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts (1)

packages/core/src/v3/isomorphic/friendlyId.ts (1)

RunId (93-93)

apps/webapp/app/runEngine/services/triggerTask.server.ts (3)

internal-packages/run-engine/src/engine/errors.ts (1)

ServiceValidationError (69-78)

apps/webapp/app/v3/services/common.server.ts (1)

ServiceValidationError (1-6)

apps/webapp/app/utils/packets.ts (1)

handleMetadataPacket (11-37)

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (2)

apps/webapp/app/v3/runEngine.server.ts (1)

engine (11-11)

apps/webapp/app/v3/utils/queueLimits.server.ts (2)

getQueueSizeLimit (24-33)

getQueueSizeLimitSource (42-51)

internal-packages/run-engine/src/batch-queue/index.ts (3)

apps/webapp/app/runEngine/services/batchTrigger.server.ts (1)

batchId (164-350)

packages/core/src/v3/apiClient/index.ts (1)

batchId (419-537)

internal-packages/run-engine/src/engine/systems/batchSystem.ts (1)

batchId (39-137)

internal-packages/run-engine/src/engine/systems/enqueueSystem.ts (1)

packages/core/src/v3/isomorphic/duration.ts (1)

parseNaturalLanguageDuration (76-79)

apps/webapp/app/runEngine/types.ts (2)

internal-packages/run-engine/src/run-queue/fairQueueSelectionStrategy.ts (4)

env (501-521)

env (560-578)

env (580-594)

env (596-616)

internal-packages/run-engine/src/shared/index.ts (1)

AuthenticatedEnvironment (4-6)

apps/webapp/app/runEngine/concerns/queues.server.ts (2)

apps/webapp/app/v3/models/workerDeployment.server.ts (1)

findCurrentWorkerFromEnvironment (198-224)

apps/webapp/app/runEngine/types.ts (1)

QueueValidationResult (38-46)

internal-packages/run-engine/src/run-queue/index.ts (1)

internal-packages/redis/src/index.ts (2)

Callback (4-4)

Result (4-4)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)

apps/webapp/app/utils/numberFormatter.ts (1)

formatNumber (15-22)

internal-packages/run-engine/src/engine/index.ts (2)

internal-packages/run-engine/src/run-queue/index.ts (2)

options (1152-1168)

options (1240-1256)

internal-packages/run-engine/src/shared/index.ts (1)

MinimalAuthenticatedEnvironment (8-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: sdk-compat / Cloudflare Workers
GitHub Check: typecheck / typecheck
GitHub Check: sdk-compat / Deno Runtime
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: sdk-compat / Bun Runtime
GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-02-05T17:41:48Z

internal-packages/run-engine/src/batch-queue/completionTracker.ts

+  /**
+   * Update the runCount in batch metadata.
+   * Used when items are skipped due to queue limits.
+   */
+  async updateRunCount(batchId: string, newRunCount: number): Promise<void> {
+    const meta = await this.getMeta(batchId);
+    if (!meta) {
+      this.logger.error("Cannot update runCount: batch metadata not found", { batchId });
+      return;
+    }
+
+    const updatedMeta: BatchMeta = {
+      ...meta,
+      runCount: newRunCount,
+    };
+
+    await this.storeMeta(batchId, updatedMeta);
+    this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount });
+  }


⚠️ Potential issue | 🟡 Minor

Guard against invalid runCount values.
updateRunCount should reject negative/NaN/non‑integer values to avoid corrupting batch metadata.

🛡️ Suggested guard

async updateRunCount(batchId: string, newRunCount: number): Promise<void> { + if (!Number.isInteger(newRunCount) || newRunCount < 0) { + this.logger.error("Invalid runCount update", { batchId, newRunCount }); + return; + } const meta = await this.getMeta(batchId);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/**

* Update the runCount in batch metadata.

* Used when items are skipped due to queue limits.

*/

async updateRunCount(batchId: string, newRunCount: number): Promise<void> {

const meta = await this.getMeta(batchId);

if (!meta) {

this.logger.error("Cannot update runCount: batch metadata not found", { batchId });

return;

}

const updatedMeta: BatchMeta = {

...meta,

runCount: newRunCount,

};

await this.storeMeta(batchId, updatedMeta);

this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount });

}

/**

* Update the runCount in batch metadata.

* Used when items are skipped due to queue limits.

*/

async updateRunCount(batchId: string, newRunCount: number): Promise<void> {

if (!Number.isInteger(newRunCount) || newRunCount < 0) {

this.logger.error("Invalid runCount update", { batchId, newRunCount });

return;

}

const meta = await this.getMeta(batchId);

if (!meta) {

this.logger.error("Cannot update runCount: batch metadata not found", { batchId });

return;

}

const updatedMeta: BatchMeta = {

...meta,

runCount: newRunCount,

};

await this.storeMeta(batchId, updatedMeta);

this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount });

}

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts` around lines 112 - 130, The updateRunCount method currently trusts newRunCount; add input validation in updateRunCount (in completionTracker.ts) to reject negative, NaN, non-finite or non-integer values before calling getMeta/storeMeta: use Number.isInteger(newRunCount) && newRunCount >= 0 (and optionally Number.isFinite) and if the check fails log an error via this.logger.error with batchId and the invalid value and then throw an Error or return a rejected Promise so the invalid value is never persisted; keep existing behavior for meta retrieval (getMeta) and storage (storeMeta) otherwise.

coderabbitai · 2026-02-05T17:41:49Z

internal-packages/run-engine/src/batch-queue/index.ts

+  /**
+   * Update the runCount for a batch.
+   * Used when items are skipped due to queue limits.
+   */
+  async updateRunCount(batchId: string, newRunCount: number): Promise<void> {
+    return this.completionTracker.updateRunCount(batchId, newRunCount);
+  }


⚠️ Potential issue | 🟠 Major

Batch can get stuck if runCount drops after processing completes.
If newRunCount is lowered to a value already reached by processedCount, no further items may run to trigger finalization, leaving the batch incomplete.

🛠️ Suggested fix

async updateRunCount(batchId: string, newRunCount: number): Promise<void> { - return this.completionTracker.updateRunCount(batchId, newRunCount); + await this.completionTracker.updateRunCount(batchId, newRunCount); + const meta = await this.completionTracker.getMeta(batchId); + if (!meta) return; + const processedCount = await this.completionTracker.getProcessedCount(batchId); + if (processedCount >= meta.runCount) { + await this.#finalizeBatch(batchId, meta); + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/**

* Update the runCount for a batch.

* Used when items are skipped due to queue limits.

*/

async updateRunCount(batchId: string, newRunCount: number): Promise<void> {

return this.completionTracker.updateRunCount(batchId, newRunCount);

}

/**

* Update the runCount for a batch.

* Used when items are skipped due to queue limits.

*/

async updateRunCount(batchId: string, newRunCount: number): Promise<void> {

await this.completionTracker.updateRunCount(batchId, newRunCount);

const meta = await this.completionTracker.getMeta(batchId);

if (!meta) return;

const processedCount = await this.completionTracker.getProcessedCount(batchId);

if (processedCount >= meta.runCount) {

await this.#finalizeBatch(batchId, meta);

}

}

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/batch-queue/index.ts` around lines 380 - 386, The updateRunCount method can lower runCount below the already-processed count and leave a batch stuck; before calling this.completionTracker.updateRunCount(batchId, newRunCount) clamp newRunCount to at least the current processed count (e.g. newRunCount = Math.max(newRunCount, this.completionTracker.getProcessedCount(batchId))) or, alternatively, after updating call into the completion tracker to run the completion/finalization check (e.g. this.completionTracker.checkCompletion/finalizeBatch for batchId) so that if processedCount >= runCount the batch is finalized immediately; update updateRunCount to perform one of these fixes using the existing completionTracker APIs.

coderabbitai · 2026-02-05T17:41:49Z

internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts

+import { containerTest, assertNonNullable } from "@internal/testcontainers";
+import { trace } from "@internal/tracing";
+import { expect } from "vitest";
+import { RunEngine } from "../index.js";
+import { setTimeout } from "node:timers/promises";
+import { setupAuthenticatedEnvironment, setupBackgroundWorker } from "./setup.js";
+
+vi.setConfig({ testTimeout: 60_000 });


⚠️ Potential issue | 🟡 Minor

Missing vi import from vitest.

The vi global is used on line 8 but not imported. While vitest may auto-inject globals, explicit imports are more reliable.

🔧 Proposed fix

import { containerTest, assertNonNullable } from "@internal/testcontainers"; import { trace } from "@internal/tracing"; -import { expect } from "vitest"; +import { expect, vi } from "vitest"; import { RunEngine } from "../index.js";

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts` around lines 1 - 8, The test uses the vitest runner's mock/time API via vi.setConfig but never imports vi; update the imports to explicitly import vi from 'vitest' (for example change the existing import of expect to import { expect, vi } from "vitest") so the vi symbol used by vi.setConfig is defined and the test runner APIs are available.

coderabbitai · 2026-02-05T17:41:49Z

internal-packages/run-engine/src/run-queue/index.ts

+  // TTL System Methods
+
+  #startTtlConsumers() {
+    if (!this.options.ttlSystem) {
+      this.logger.debug("TTL system disabled (no ttlSystem config)");
+      return;
+    }
+
+    const shardCount = this.options.ttlSystem.shardCount ?? this.shardCount;
+
+    for (let i = 0; i < shardCount; i++) {
+      this.logger.debug(`Starting TTL consumer ${i}`);
+      this.#startTtlConsumer(i).catch((err) => {
+        this.logger.error(`Failed to start TTL consumer ${i}`, { error: err });
+      });
+    }
+
+    this.logger.debug(`Started ${shardCount} TTL consumers`);
+  }
+
+  async #startTtlConsumer(shard: number) {
+    if (!this.options.ttlSystem) {
+      return;
+    }
+
+    const pollIntervalMs = this.options.ttlSystem.pollIntervalMs ?? 1000;
+    const batchSize = this.options.ttlSystem.batchSize ?? 100;
+    let processedCount = 0;
+
+    try {
+      for await (const _ of setInterval(pollIntervalMs, null, {
+        signal: this.abortController.signal,
+      })) {
+        const now = Date.now();
+
+        const [error, expiredRuns] = await tryCatch(
+          this.#expireTtlRuns(shard, now, batchSize)
+        );
+
+        if (error) {
+          this.logger.error(`Failed to expire TTL runs for shard ${shard}`, {
+            error,
+            service: this.name,
+            shard,
+          });
+          continue;
+        }
+
+        if (expiredRuns.length > 0) {
+          this.logger.debug(`Expired ${expiredRuns.length} TTL runs in shard ${shard}`, {
+            service: this.name,
+            shard,
+            count: expiredRuns.length,
+          });
+
+          // Call the callback with expired runs
+          try {
+            await this.options.ttlSystem!.callback(expiredRuns);
+            processedCount += expiredRuns.length;
+          } catch (callbackError) {
+            this.logger.error(`TTL callback failed for shard ${shard}`, {
+              error: callbackError,
+              service: this.name,
+              shard,
+              runCount: expiredRuns.length,
+            });
+          }


⚠️ Potential issue | 🟠 Major

Callback failures can silently drop expired runs.

#expireTtlRuns() removes runs before invoking the callback, and failures are only logged. If the callback is responsible for persisting expiration state, those runs may be lost without retry. Consider a durable retry path (e.g., a Redis list/sorted set of failed expirations, or a dedicated worker with backoff) or make the callback itself handle retry + idempotency explicitly.

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/run-queue/index.ts` around lines 1238 - 1304, The code currently calls `#expireTtlRuns`(shard, now, batchSize) which deletes/marks runs as expired before invoking options.ttlSystem!.callback(expiredRuns), so callback failures only get logged and those runs can be lost; update `#startTtlConsumer` (or `#expireTtlRuns`) so that expiration is durable: either (A) defer permanent deletion/marking of runs until after the callback completes successfully (move the delete/mark logic to after options.ttlSystem!.callback returns) or (B) if deletion must happen first, add a durable retry/enqueue path when the callback throws — e.g., push failed expired run IDs to a retry store (Redis list/sorted set) with metadata and backoff, and spawn a retry worker to reprocess them; ensure the callback and retry path are idempotent and update logging to include run IDs and shard when queuing retries.

internal-packages/run-engine/src/run-queue/index.ts

…d limits page

…env queue size check

…new ttl system

…ndles failures from queue length limit failures and also retries

… var, set at engine level

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal-packages/run-engine/src/batch-queue/index.ts (1)

958-976: ⚠️ Potential issue | 🟠 Major

Completion check at Line 969 uses meta.runCount which may be stale.

meta is fetched once per message at Line 762-764. If updateRunCount was called between the meta fetch and the completion check, the comparison processedCount === meta.runCount could use an outdated value, potentially missing or double-triggering finalization. For correctness, consider re-fetching runCount atomically alongside the processedCount increment, or having updateRunCount trigger a finalization check.

🤖 Fix all issues with AI agents

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx:
- Around line 367-370: The call to getQueueUsageColorClass(environment.queued,
environment.queueSizeLimit) is unresolved — add a definition or import for
getQueueUsageColorClass that mirrors the logic of the existing
getUsageColorClass used in the limits route: implement a function
getQueueUsageColorClass(queued: number | undefined, queueSizeLimit: number |
undefined) that returns a CSS class string (or null) based on usage percentage
thresholds and handles undefined values; place it in the same file (or import
it) so valueClassName can call it, and ensure it returns null when no class
should be applied so the existing env.paused fallback ("text-warning
tabular-nums" or "tabular-nums") continues to work.
- Around line 515-518: The per-queue limit check (isAtQueueLimit) is using
queue.queued against environment.queueSizeLimit which is an environment-wide
cap; change the comparison to use environment.queued (total queued across all
queues) versus environment.queueSizeLimit so the badge reflects environment
capacity (or implement a true per-queue limit if intended). Update the
isAtQueueLimit logic to reference environment.queued and
environment.queueSizeLimit (consistent with the BigNumber usage) and remove or
revise any reliance on queue.queued for this environment-wide badge.

In `@apps/webapp/app/v3/runEngineHandlers.server.ts`:
- Around line 722-754: The duplicated pre-failed run creation via
triggerFailedTaskService.call when isFinalAttempt can itself throw and cause a
second attempt in the outer catch — wrap each inner invocation of
triggerFailedTaskService.call (the calls inside the isFinalAttempt branch and
the analogous call in the outer catch) in a small try/catch so any errors from
that call are caught and handled locally (log the error to the existing
span/process logger and avoid rethrowing), ensure span.end() still runs, and
return the same successful pre-failed-run response when a failedRunId is
returned; this prevents a partial success from causing a duplicate pre-failed
run while preserving the current control flow for both the isFinalAttempt path
and the outer catch path.

In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts`:
- Around line 116-130: The updateRunCount function performs a non-atomic
read-modify-write (getMeta → modify → storeMeta) which can lose updates under
concurrent callers; change it to perform the increment atomically in the backing
store (e.g., use a Redis Lua script or the same atomic pattern used by
recordSuccessIdempotent) so runCount is updated with a single atomic operation,
or enforce/document single-writer semantics; specifically replace the current
getMeta/storeMeta sequence in updateRunCount with an atomic increment/update
that targets the BatchMeta.runCount field (or wrap the logic in a Lua script
that reads, increments/sets and returns the new value) and update logging to
reflect the atomic result.

In `@internal-packages/run-engine/src/engine/index.ts`:
- Around line 2275-2300: The method `#resolveMaxTtl` currently returns the
original unparseable ttl string when parseNaturalLanguageDurationInMs(ttl)
yields undefined, which allows invalid TTLs to bypass the max clamp; change the
logic so that if ttl fails to parse but the configured options.defaultMaxTtl
parses successfully, you return the parsed max (i.e., maxTtl) to enforce the
cap; keep the existing fallback that if defaultMaxTtl itself is unparseable
(parseNaturalLanguageDurationInMs(maxTtl) === undefined) you return ttl as
before.
- Around line 2245-2273: The `#ttlExpiredCallback` currently calls
this.ttlSystem.expireRunsBatch after a Lua script has already removed runs from
Redis, so if expireRunsBatch throws the DB update fails the runs become
orphaned; to fix, add compensation in `#ttlExpiredCallback` to detect errors from
ttlSystem.expireRunsBatch and, on failure, re-add the affected runIds back into
the TTL structures (sorted set/queue/concurrency tracking) via a new ttlSystem
method (e.g., ttlSystem.requeueRuns or ttlSystem.restoreRuns) or invoke existing
requeue logic, and/or schedule the runIds for a background cleanup retry; ensure
the added code references `#ttlExpiredCallback`, ttlSystem.expireRunsBatch, and
the Lua removal behavior so the requeue/cleanup is called only when
expireRunsBatch fails.

In `@internal-packages/run-engine/src/engine/systems/ttlSystem.ts`:
- Around line 248-291: The batch update (taskRun.updateMany) followed by per-run
updates (taskRun.update and taskRunExecutionSnapshot.create) can leave runs
marked EXPIRED without snapshots/errors if any per-run op fails; change
ttlExpiredCallback to perform each run's status change, error write, and
snapshot creation inside a single per-run prisma.$transaction() (instead of the
global updateMany + Promise.all) so that setting status to EXPIRED,
completedAt/expiredAt, writing the error, and creating taskRunExecutionSnapshot
happen atomically for each run (operate on runsToExpire entries and replace uses
of taskRun.updateMany, taskRun.update, and taskRunExecutionSnapshot.create
accordingly), and ensure transaction failures are surfaced/logged so those runs
remain PENDING for the next TTL poll.

In `@internal-packages/run-engine/src/run-queue/index.ts`:
- Around line 2710-2748: The dequeue Lua branch that handles TTL expiration
(checks using ttlExpiresAt <= currentTime) currently removes the message from
queueKey, envQueueKey, messageKey and ttlQueueKey but never reports the expired
run to the application; to fix this, stop evicting TTL-expired messages in the
dequeue path: modify the ttlExpiresAt <= currentTime branch so it does not
remove the message or its TTL membership (i.e., do not call ZREM on
queueKey/envQueueKey or DEL messageKey or ZREM ttlQueueKey) and simply skip
adding the run to results (leaving the message for the TTL consumer to handle),
ensuring that results, dequeuedCount and SADD calls are untouched for the
non-expired branch; locate and change the code around the symbols ttlExpiresAt,
currentTime, queueKey, envQueueKey, ttlQueueKey, messageKey, results and
dequeuedCount.

🧹 Nitpick comments (10)

apps/webapp/app/env.server.ts (1)
536-540: QUEUE_SIZE_CACHE_ENABLED uses numeric type instead of BoolEnv.

Other boolean-like env vars in this file (e.g., RUN_ENGINE_TTL_SYSTEM_DISABLED, BATCH_TRIGGER_CACHED_RUNS_CHECK_ENABLED) use BoolEnv. Using z.coerce.number().int() for an enable/disable flag is inconsistent and forces consumers to compare against 1 instead of using a proper boolean.
Suggested change
-    QUEUE_SIZE_CACHE_ENABLED: z.coerce.number().int().optional().default(1),
+    QUEUE_SIZE_CACHE_ENABLED: BoolEnv.default(true),
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts (1)
120-122: Non-null assertion on associatedWaitpoint! is safe but could be cleaner.

The ! is safe because of the early return on line 98-100, but TypeScript can't narrow let variables inside closures. Consider assigning to a const after the guard to eliminate the assertion.
Suggested approach
         if (!associatedWaitpoint) {
           return { isCached: true, run: existingRun };
         }
+        const waitpoint = associatedWaitpoint;

         await this.traceEventConcern.traceIdempotentRun(
           ...
             await this.engine.blockRunWithWaitpoint({
               runId: RunId.fromFriendlyId(parentRunId),
-              waitpoints: associatedWaitpoint!.id,
+              waitpoints: waitpoint.id,
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts (1)

214-218: Relaxed assertion removes verification of waitpoint completion in snapshots.

The original assertion that latest.completedWaitpoints.length > 0 was removed and replaced with comments. While timing sensitivity is real, this means the test no longer verifies the core "waitpoints only on latest" behavior — it only checks that earlier snapshots have empty waitpoints.

Consider adding a retry/poll loop (e.g., poll getSnapshotsSince until the latest snapshot has completedWaitpoints.length > 0 or a timeout), so the test still validates the positive case without being flaky.
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
17-17: Derive/validate env + project from the run to prevent mismatches.
Nice use of isFinalRunStatus to avoid creating waitpoints for terminal runs. Since you already load the run, consider using/validating its projectId and runtimeEnvironmentId instead of trusting caller inputs, to avoid accidental cross‑environment linkage.
🔧 Suggested adjustment
-    const run = await this.$.prisma.taskRun.findFirst({
-      where: { id: runId },
-      include: { associatedWaitpoint: true },
-    });
+    const run = await this.$.prisma.taskRun.findFirst({
+      where: { id: runId },
+      select: {
+        status: true,
+        projectId: true,
+        runtimeEnvironmentId: true,
+        associatedWaitpoint: true,
+      },
+    });

...

-      const runAfterLock = await this.$.prisma.taskRun.findFirst({
-        where: { id: runId },
-        include: { associatedWaitpoint: true },
-      });
+      const runAfterLock = await this.$.prisma.taskRun.findFirst({
+        where: { id: runId },
+        select: {
+          status: true,
+          projectId: true,
+          runtimeEnvironmentId: true,
+          associatedWaitpoint: true,
+        },
+      });

+      if (
+        runAfterLock.projectId !== projectId ||
+        runAfterLock.runtimeEnvironmentId !== environmentId
+      ) {
+        throw new Error(
+          "getOrCreateRunWaitpoint: run does not match provided project/environment"
+        );
+      }

-      const waitpointData = this.buildRunAssociatedWaitpoint({ projectId, environmentId });
+      const waitpointData = this.buildRunAssociatedWaitpoint({
+        projectId: runAfterLock.projectId,
+        environmentId: runAfterLock.runtimeEnvironmentId,
+      });
Also applies to: 819-875
apps/webapp/app/runEngine/types.ts (1)
59-77: Prefer a type alias for QueueManager (repo guideline).
Since this interface is being updated, consider converting it to a type alias to match the TS style guide.
♻️ Suggested refactor
-export interface QueueManager {
+export type QueueManager = {
   resolveQueueProperties(
     request: TriggerTaskRequest,
     lockedBackgroundWorker?: LockedBackgroundWorker
   ): Promise<QueueProperties>;
   getQueueName(request: TriggerTaskRequest): Promise<string>;
   validateQueueLimits(
     env: AuthenticatedEnvironment,
     queueName: string,
     itemsToAdd?: number
   ): Promise<QueueValidationResult>;
   resolveQueueNamesForBatchItems(
     env: AuthenticatedEnvironment,
     items: Array<{ task: string; options?: { queue?: { name?: string } } }>
   ): Promise<Map<string, number>>;
   validateMultipleQueueLimits(
     env: AuthenticatedEnvironment,
     queueCounts: Map<string, number>
   ): Promise<{ ok: true } | { ok: false; queueName: string; maximumSize: number; queueSize: number }>;
   getWorkerQueue(
     env: AuthenticatedEnvironment,
     regionOverride?: string
   ): Promise<string | undefined>;
-}
+};
As per coding guidelines "Use types over interfaces for TypeScript".
apps/webapp/app/v3/runEngineHandlers.server.ts (1)

651-670: TriggerFailedTaskService is instantiated on every batch item invocation.

The service is only used on final-attempt failures, but it's constructed eagerly at the top of every callback invocation. Consider moving the instantiation inside the isFinalAttempt guards to avoid unnecessary allocations on the happy path.

apps/webapp/app/runEngine/concerns/queues.server.ts (1)

421-436: getCachedQueueSize fails open on cache errors — verify this is intentional.

Line 435 returns result.val ?? 0, meaning if the cache fetch fails (returning undefined), the queue size is treated as 0. This allows requests through even when queue size can't be determined, which is a permissive-failure strategy. This is likely the right trade-off (availability over strict enforcement), but worth documenting.
internal-packages/run-engine/src/run-queue/index.ts (3)
108-110: Use type instead of interface for TtlSystemCallback.

Per coding guidelines, prefer types over interfaces in TypeScript files.
Proposed fix
-export interface TtlSystemCallback {
-  (runs: Array<{ queueKey: string; runId: string; orgId: string }>): Promise<void>;
-}
+export type TtlSystemCallback = (
+  runs: Array<{ queueKey: string; runId: string; orgId: string }>
+) => Promise<void>;
As per coding guidelines: **/*.{ts,tsx}: Use types over interfaces for TypeScript.

669-677: TTL member uses | as delimiter — consider a safer encoding.

The TTL member format ${queueKey}|${runId}|${orgId} relies on | never appearing in any of these values. While current ID formats make this safe today, a future change to key formats could silently break the parsing in the expireTtlRuns Lua script (Lines 2582–2588). A null byte (\0) or a fixed-width encoding would be more robust.

1713-1721: Minor: TTL shard is computed even when TTL system is disabled.

The TTL shard derivation (Lines 1714–1718) runs unconditionally, though the result is only meaningful when this.options.ttlSystem is set. This is functionally harmless since the empty-string ttlQueueKey disables TTL logic in the Lua script, but wrapping it in the conditional would be slightly cleaner.

coderabbitai · 2026-02-09T22:13:15Z

...app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx

+              valueClassName={
+                getQueueUsageColorClass(environment.queued, environment.queueSizeLimit) ??
+                (env.paused ? "text-warning tabular-nums" : "tabular-nums")
+              }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for where getQueueUsageColorClass is defined or referenced rg "getQueueUsageColorClass" --type=ts --type=tsx -n

Repository: triggerdotdev/trigger.dev

Length of output: 96

🏁 Script executed:

#!/bin/bash # Search for getQueueUsageColorClass without file type filter rg "getQueueUsageColorClass" -n

Repository: triggerdotdev/trigger.dev

Length of output: 268

🏁 Script executed:

#!/bin/bash # Search for similar function names like getUsageColorClass rg "getUsageColorClass" -n

Repository: triggerdotdev/trigger.dev

Length of output: 574

🏁 Script executed:

#!/bin/bash # Check the imports in the queues route file head -50 "apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx"

Repository: triggerdotdev/trigger.dev

Length of output: 193

🏁 Script executed:

#!/bin/bash # Find the queues route file find . -name "route.tsx" -path "*queues*" | head -5

Repository: triggerdotdev/trigger.dev

Length of output: 175

🏁 Script executed:

#!/bin/bash # Read the first 100 lines of the queues route to check imports head -100 "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx" | cat -n

Repository: triggerdotdev/trigger.dev

Length of output: 5079

🏁 Script executed:

#!/bin/bash # Search around line 368 to see the context and also check the end of the file for function definitions sed -n '360,380p' "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx" | cat -n

Repository: triggerdotdev/trigger.dev

Length of output: 1025

🏁 Script executed:

#!/bin/bash # Check the end of the file to see if the function is defined locally tail -100 "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx" | cat -n

Repository: triggerdotdev/trigger.dev

Length of output: 3402

🏁 Script executed:

#!/bin/bash # Get the total line count of the file wc -l "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx"

Repository: triggerdotdev/trigger.dev

Length of output: 180

🏁 Script executed:

#!/bin/bash # Search for getQueueUsageColorClass definition in the entire file grep -n "function getQueueUsageColorClass\|const getQueueUsageColorClass" "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx"

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

#!/bin/bash # Search for getQueueUsageColorClass in all TypeScript/JavaScript files rg "getQueueUsageColorClass" -A 2 -B 2

Repository: triggerdotdev/trigger.dev

Length of output: 823

🏁 Script executed:

#!/bin/bash # Check the limits route to see the getUsageColorClass function for reference sed -n '799,850p' "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.limits/route.tsx" | cat -n

Repository: triggerdotdev/trigger.dev

Length of output: 1699

🏁 Script executed:

#!/bin/bash # Look for any color class functions in the queues route file grep -n "function.*Color\|const.*Color" "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx"

Repository: triggerdotdev/trigger.dev

Length of output: 51

getQueueUsageColorClass function is missing and will cause a runtime error.

Line 368 calls getQueueUsageColorClass(environment.queued, environment.queueSizeLimit) but this function is neither defined in this file nor imported from anywhere. The codebase has no definition of this function.

Define this function locally in this file or import it from the appropriate module. Reference the similar getUsageColorClass function in the limits route for implementation guidance.

🤖 Prompt for AI Agents

In `@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx around lines 367 - 370, The call to getQueueUsageColorClass(environment.queued, environment.queueSizeLimit) is unresolved — add a definition or import for getQueueUsageColorClass that mirrors the logic of the existing getUsageColorClass used in the limits route: implement a function getQueueUsageColorClass(queued: number | undefined, queueSizeLimit: number | undefined) that returns a CSS class string (or null) based on usage percentage thresholds and handles undefined values; place it in the same file (or import it) so valueClassName can call it, and ensure it returns null when no class should be applied so the existing env.paused fallback ("text-warning tabular-nums" or "tabular-nums") continues to work.

coderabbitai · 2026-02-09T22:13:15Z

...app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx

+                      const isAtConcurrencyLimit = queue.running >= limit;
+                      const isAtQueueLimit =
+                        environment.queueSizeLimit !== null &&
+                        queue.queued >= environment.queueSizeLimit;


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n "queueSizeLimit" --type=ts --type=tsx -C 3

Repository: triggerdotdev/trigger.dev

Length of output: 96

🏁 Script executed:

rg -n "queueSizeLimit" -C 3

Repository: triggerdotdev/trigger.dev

Length of output: 6310

🏁 Script executed:

rg -n "function getQueueSizeLimit|const getQueueSizeLimit|export.*getQueueSizeLimit" -C 5

Repository: triggerdotdev/trigger.dev

Length of output: 2121

🏁 Script executed:

cat -n apps/webapp/app/v3/utils/queueLimits.server.ts

Repository: triggerdotdev/trigger.dev

Length of output: 2183

🏁 Script executed:

rg -n "guardQueueSizeLimitsForEnv" -C 10

Repository: triggerdotdev/trigger.dev

Length of output: 10225

🏁 Script executed:

cat -n apps/webapp/app/v3/queueSizeLimits.server.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1577

🏁 Script executed:

sed -n '340,375p' apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx

Repository: triggerdotdev/trigger.dev

Length of output: 177

🏁 Script executed:

find apps/webapp/app/routes -name "route.tsx" | grep queues

Repository: triggerdotdev/trigger.dev

Length of output: 173

🏁 Script executed:

sed -n '340,375p' 'apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx'

Repository: triggerdotdev/trigger.dev

Length of output: 1559

🏁 Script executed:

sed -n '510,525p' 'apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx'

Repository: triggerdotdev/trigger.dev

Length of output: 901

The per-queue queue limit badge uses incorrect semantics.

isAtQueueLimit at lines 517–518 compares queue.queued (a single queue's count) against environment.queueSizeLimit, which is an environment-wide limit. The queueSizeLimit is calculated once per environment and represents the total allowed items across all queues, not per-queue capacity.

This comparison will almost never trigger correctly. For example, with a 1000-item environment limit and 10 queues, a single queue would need 1000 items to trigger the badge, even though the environment might already be at capacity with items spread across queues.

Use environment.queued (total across all queues) instead of queue.queued for consistency with the BigNumber component at line 368, or implement a per-queue limit if that's the intended behavior.

🤖 Prompt for AI Agents

In `@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx around lines 515 - 518, The per-queue limit check (isAtQueueLimit) is using queue.queued against environment.queueSizeLimit which is an environment-wide cap; change the comparison to use environment.queued (total queued across all queues) versus environment.queueSizeLimit so the badge reflects environment capacity (or implement a true per-queue limit if intended). Update the isAtQueueLimit logic to reference environment.queued and environment.queueSizeLimit (consistent with the BigNumber usage) and remove or revise any reliance on queue.queued for this environment-wide badge.

coderabbitai · 2026-02-09T22:13:15Z

apps/webapp/app/v3/runEngineHandlers.server.ts

            span.setAttribute("batch.result.error", "TriggerTaskService returned undefined");
-            span.end();
+
+            // Only create a pre-failed run on the final attempt; otherwise let the retry mechanism handle it
+            if (isFinalAttempt) {
+              const failedRunId = await triggerFailedTaskService.call({
+                taskId: item.task,
+                environment,
+                payload: item.payload,
+                payloadType: item.payloadType as string,
+                errorMessage: "TriggerTaskService returned undefined",
+                parentRunId: meta.parentRunId,
+                resumeParentOnCompletion: meta.resumeParentOnCompletion,
+                batch: { id: batchId, index: itemIndex },
+                options: item.options as Record<string, unknown>,
+                traceContext: meta.traceContext as Record<string, unknown> | undefined,
+                spanParentAsLink: meta.spanParentAsLink,
+                errorCode: TaskRunErrorCodes.BATCH_ITEM_COULD_NOT_TRIGGER,
+              });
+
+              span.end();
+
+              if (failedRunId) {
+                return { success: true as const, runId: failedRunId };
+              }
+            } else {
+              span.end();
+            }
+
            return {
              success: false as const,
              error: "TriggerTaskService returned undefined",
              errorCode: "TRIGGER_FAILED",
            };


⚠️ Potential issue | 🟡 Minor

Pre-failed run creation on final attempt — duplicated error-handling blocks.

The two paths (undefined result at Line 725 and caught exception at Line 762) share nearly identical triggerFailedTaskService.call(...) invocations with the same parameters. This is acceptable given the different control-flow contexts but worth noting for future extraction if the parameter list grows.

One thing to verify: if triggerFailedTaskService.call itself throws, the exception in the first block (Line 726) would bubble into the outer catch (Line 756), which would then attempt to call triggerFailedTaskService.call again on the same item. This could create a duplicate pre-failed run if the first call partially succeeded (e.g., DB row created but an error thrown after).

Proposed fix: wrap the inner triggerFailedTaskService.call in try/catch

if (isFinalAttempt) { + try { const failedRunId = await triggerFailedTaskService.call({ // ...params... }); span.end(); if (failedRunId) { return { success: true as const, runId: failedRunId }; } + } catch (failedTaskError) { + logger.error("Failed to create pre-failed run", { + batchId, + itemIndex, + error: failedTaskError instanceof Error ? failedTaskError.message : String(failedTaskError), + }); + span.end(); + } } else { span.end(); }

Also applies to: 756-791

🤖 Prompt for AI Agents

In `@apps/webapp/app/v3/runEngineHandlers.server.ts` around lines 722 - 754, The duplicated pre-failed run creation via triggerFailedTaskService.call when isFinalAttempt can itself throw and cause a second attempt in the outer catch — wrap each inner invocation of triggerFailedTaskService.call (the calls inside the isFinalAttempt branch and the analogous call in the outer catch) in a small try/catch so any errors from that call are caught and handled locally (log the error to the existing span/process logger and avoid rethrowing), ensure span.end() still runs, and return the same successful pre-failed-run response when a failedRunId is returned; this prevents a partial success from causing a duplicate pre-failed run while preserving the current control flow for both the isFinalAttempt path and the outer catch path.

coderabbitai · 2026-02-09T22:13:15Z

internal-packages/run-engine/src/batch-queue/completionTracker.ts

+  async updateRunCount(batchId: string, newRunCount: number): Promise<void> {
+    const meta = await this.getMeta(batchId);
+    if (!meta) {
+      this.logger.error("Cannot update runCount: batch metadata not found", { batchId });
+      return;
+    }
+
+    const updatedMeta: BatchMeta = {
+      ...meta,
+      runCount: newRunCount,
+    };
+
+    await this.storeMeta(batchId, updatedMeta);
+    this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount });
+  }


⚠️ Potential issue | 🟠 Major

Non-atomic read-modify-write on batch metadata.

updateRunCount performs getMeta → modify → storeMeta without any locking or atomic operation. If called concurrently for the same batch (e.g., two queue-limit rejections racing), one update could be lost, leaving runCount incorrect and potentially causing the batch to never complete.

Consider using a Lua script (similar to recordSuccessIdempotent) to atomically update the runCount field, or document that this method must only be called from a single-writer context.

🔒 Sketch of atomic Lua approach

+ // Register in `#registerCommands`(): + this.redis.defineCommand("updateRunCountAtomic", { + numberOfKeys: 1, + lua: ` + local metaKey = KEYS[1] + local newRunCount = tonumber(ARGV[1]) + local metaJson = redis.call('GET', metaKey) + if not metaJson then return nil end + local meta = cjson.decode(metaJson) + meta.runCount = newRunCount + redis.call('SET', metaKey, cjson.encode(meta)) + return 'OK' + `, + }); async updateRunCount(batchId: string, newRunCount: number): Promise<void> { - const meta = await this.getMeta(batchId); - if (!meta) { - this.logger.error("Cannot update runCount: batch metadata not found", { batchId }); - return; - } - const updatedMeta: BatchMeta = { ...meta, runCount: newRunCount }; - await this.storeMeta(batchId, updatedMeta); + const result = await this.redis.updateRunCountAtomic( + this.metaKey(batchId), + newRunCount.toString() + ); + if (!result) { + this.logger.error("Cannot update runCount: batch metadata not found", { batchId }); + return; + } this.logger.debug("Updated batch runCount", { batchId, newRunCount }); }

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts` around lines 116 - 130, The updateRunCount function performs a non-atomic read-modify-write (getMeta → modify → storeMeta) which can lose updates under concurrent callers; change it to perform the increment atomically in the backing store (e.g., use a Redis Lua script or the same atomic pattern used by recordSuccessIdempotent) so runCount is updated with a single atomic operation, or enforce/document single-writer semantics; specifically replace the current getMeta/storeMeta sequence in updateRunCount with an atomic increment/update that targets the BatchMeta.runCount field (or wrap the logic in a Lua script that reads, increments/sets and returns the new value) and update logging to reflect the atomic result.

coderabbitai · 2026-02-09T22:13:15Z

internal-packages/run-engine/src/engine/index.ts

+  async #ttlExpiredCallback(
+    runs: Array<{ queueKey: string; runId: string; orgId: string }>
+  ): Promise<void> {
+    if (runs.length === 0) return;
+
+    try {
+      const runIds = runs.map((r) => r.runId);
+      const result = await this.ttlSystem.expireRunsBatch(runIds);
+
+      if (result.expired.length > 0) {
+        this.logger.debug("TTL system expired runs", {
+          expiredCount: result.expired.length,
+          expiredRunIds: result.expired,
+        });
+      }
+
+      if (result.skipped.length > 0) {
+        this.logger.debug("TTL system skipped runs", {
+          skippedCount: result.skipped.length,
+          skipped: result.skipped,
+        });
+      }
+    } catch (error) {
+      this.logger.error("Failed to expire runs via TTL system", {
+        runIds: runs.map((r) => r.runId),
+        error,
+      });
+    }
+  }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the overall TTL system structure and where `#ttlExpiredCallback` is used rg -n "ttlExpiredCallback|ttlSystem|expireRunsBatch" --type=ts -g '!node_modules' internal-packages/run-engine/src/ -B 2 -A 2

Repository: triggerdotdev/trigger.dev

Length of output: 19275

🏁 Script executed:

# Find the TtlSystem class implementation fd -e ts -g '!node_modules' internal-packages/run-engine/src/ | xargs grep -l "class TtlSystem\|export.*TtlSystem"

Repository: triggerdotdev/trigger.dev

Length of output: 53

🏁 Script executed:

# Check the entire TtlSystem class and its polling mechanism rg -n "class TtlSystem" --type=ts -g '!node_modules' -A 100 internal-packages/run-engine/src/

Repository: triggerdotdev/trigger.dev

Length of output: 9426

🏁 Script executed:

# Find the Lua script for TTL queue operations fd -e lua --type f internal-packages/run-engine/src/ | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 506

🏁 Script executed:

# Search for the Lua script or script loading for TTL rg -n "ttl.*lua\|ZRANGE.*ttl\|expired.*lua" --type=ts -g '!node_modules' internal-packages/run-engine/src/ -B 2 -A 2

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check the full error handling in the TTL consumer, particularly what happens after the catch block sed -n '1290,1320p' internal-packages/run-engine/src/run-queue/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 865

🏁 Script executed:

# Search for Lua script files in the run-engine find internal-packages/run-engine/src -type f -name "*.lua"

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Search for script content embedded in TypeScript files related to TTL rg -n "ZRANGE|ZREVRANGE|ZADD|ZREM" --type=ts internal-packages/run-engine/src/run-queue/ -B 5 -A 5 | head -100

Repository: triggerdotdev/trigger.dev

Length of output: 8490

🏁 Script executed:

# Look at the `#getPendingTtlRuns` method more carefully sed -n '1320,1365p' internal-packages/run-engine/src/run-queue/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1664

🏁 Script executed:

# Let me see the complete Lua script for expireTtlRuns to understand the full flow sed -n '2560,2630p' internal-packages/run-engine/src/run-queue/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 2932

🏁 Script executed:

# Check if there's any mechanism to restore runs if callback fails rg -n "expireTtlRuns\|#expireTtlRuns" --type=ts -A 30 internal-packages/run-engine/src/run-queue/index.ts | head -80

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Get the rest of the Lua script after line 2630 sed -n '2620,2660p' internal-packages/run-engine/src/run-queue/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1563

🏁 Script executed:

# Let me also look at the test to see what happens when callback fails rg -n "callback.*fail\|error.*ttl\|catch.*ttl" --type=ts internal-packages/run-engine/src/engine/tests/ttl.test.ts -B 2 -A 5

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Let me understand the callback flow better - what does expireRunsBatch do in case of failure? sed -n '2245,2273p' internal-packages/run-engine/src/engine/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 911

🏁 Script executed:

# Check if there's any other mechanism that reschedules or retries failed TTL expirations rg -n "ttlSystem.*retry\|ttlSystem.*reschedule\|ttlSystem.*recover" --type=ts -g '!node_modules' internal-packages/run-engine/src/

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Let me check if the runs that fail to expire in the database would somehow get picked up again rg -n "PENDING.*status\|status.*PENDING" --type=ts internal-packages/run-engine/src/engine/systems/ttlSystem.ts -B 2 -A 2

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check if failed runs could be re-added to the TTL queue by any mechanism rg -n "scheduleExpireRun\|ZADD.*ttl" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/ -B 2 -A 5 | head -60

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Look at the full flow of how runs get added to TTL queue initially sed -n '735,750p' internal-packages/run-engine/src/engine/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 509

🏁 Script executed:

# Check if there's any background job or cron that might re-discover stuck PENDING runs rg -n "PENDING\|pendingRuns\|stuckRuns\|orphaned" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/ | grep -i "pending.*run\|discover\|retry\|recovery" | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Verify the exact removal point in the Lua script - is it before callback or built into callback success? sed -n '2560,2650p' internal-packages/run-engine/src/run-queue/index.ts | grep -A 50 "ZREM.*ttl"

Repository: triggerdotdev/trigger.dev

Length of output: 2062

🏁 Script executed:

# Critical finding: The Lua script removes runs BEFORE callback. Let me verify if there's any cleanup mechanism rg -n "cleanup\|recover\|orphan\|stuck" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/ | head -30

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check if there's a mechanism to re-add runs that fail database updates rg -n "scheduleExpireRun" --type=ts internal-packages/run-engine/src/engine/systems/ttlSystem.ts -A 20

Repository: triggerdotdev/trigger.dev

Length of output: 893

🏁 Script executed:

# The critical insight: runs are ZREM'd from Redis in the Lua script BEFORE callback # Let me verify if there's any recovery: can orphaned PENDING runs be re-discovered? rg -n "PENDING.*timeout\|TTL.*expired\|orphan" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check what happens when expireRunsBatch partially succeeds/fails sed -n '155,220p' internal-packages/run-engine/src/engine/systems/ttlSystem.ts

Repository: triggerdotdev/trigger.dev

Length of output: 2029

🏁 Script executed:

# Continue looking at expireRunsBatch to see error handling sed -n '220,280p' internal-packages/run-engine/src/engine/systems/ttlSystem.ts

Repository: triggerdotdev/trigger.dev

Length of output: 2115

🏁 Script executed:

# Check if there's any compensation logic or rollback if expireRunsBatch fails rg -n "catch\|throw\|error\|rollback" internal-packages/run-engine/src/engine/systems/ttlSystem.ts -B 1 -A 3

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Verify the critical point: does the Lua script execution and callback have transactional guarantees? # Check if failed callback can somehow trigger re-addition to TTL queue sed -n '1280,1320p' internal-packages/run-engine/src/run-queue/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1140

🏁 Script executed:

# Check if there's any watchdog or scheduled job that finds orphaned PENDING runs fd . internal-packages/run-engine/src/engine/systems -name "*.ts" -type f | xargs grep -l "PENDING\|orphan\|stuck\|recovery"

Repository: triggerdotdev/trigger.dev

Length of output: 301

🏁 Script executed:

# Check if there's any background cleanup for orphaned PENDING runs rg -n "expireRun\|PENDING" --type=ts internal-packages/run-engine/src/engine/index.ts | grep -i "cleanup\|background\|cron\|schedule\|task\|job"

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Look at the test to see if it covers failure scenarios sed -n '1,50p' internal-packages/run-engine/src/engine/tests/ttl.test.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1517

🏁 Script executed:

# Let me verify this critical finding: check the exact atomicity guarantee # The Lua script removes runs BEFORE callback - is there any compensation? rg -n "expireRunsBatch\|#ttlExpiredCallback" --type=ts -A 5 internal-packages/run-engine/src/engine/index.ts | head -60

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check if there's any mechanism in place to recover from partial failures # Look at the full test for TTL expiration and error scenarios sed -n '1,100p' internal-packages/run-engine/src/engine/tests/ttl.test.ts | tail -50

Repository: triggerdotdev/trigger.dev

Length of output: 1612

Runs removed from Redis before database update completes—orphaned if callback fails.

The Lua script atomically removes runs from the TTL sorted set, queue, and concurrency tracking before the callback executes. If expireRunsBatch fails after the Lua removal completes, the runs remain PENDING in the database but are no longer in Redis, with no recovery mechanism. They cannot be rediscovered by the polling system, as there is no scheduled job to find orphaned PENDING runs past their TTL.

Consider either: (1) moving the database update into the Lua script to guarantee atomicity, (2) adding compensation logic to re-add failed runs to the TTL queue, or (3) implementing a background cleanup job to discover and expire orphaned PENDING runs.

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/engine/index.ts` around lines 2245 - 2273, The `#ttlExpiredCallback` currently calls this.ttlSystem.expireRunsBatch after a Lua script has already removed runs from Redis, so if expireRunsBatch throws the DB update fails the runs become orphaned; to fix, add compensation in `#ttlExpiredCallback` to detect errors from ttlSystem.expireRunsBatch and, on failure, re-add the affected runIds back into the TTL structures (sorted set/queue/concurrency tracking) via a new ttlSystem method (e.g., ttlSystem.requeueRuns or ttlSystem.restoreRuns) or invoke existing requeue logic, and/or schedule the runIds for a background cleanup retry; ensure the added code references `#ttlExpiredCallback`, ttlSystem.expireRunsBatch, and the Lua removal behavior so the requeue/cleanup is called only when expireRunsBatch fails.

coderabbitai · 2026-02-09T22:13:15Z

internal-packages/run-engine/src/engine/index.ts

+  /**
+   * Applies `defaultMaxTtl` to a run's TTL:
+   * - No max configured → pass through as-is.
+   * - No TTL on the run → use the max as the default.
+   * - Both exist → clamp to the smaller value.
+   */
+  #resolveMaxTtl(ttl: string | undefined): string | undefined {
+    const maxTtl = this.options.defaultMaxTtl;
+
+    if (!maxTtl) {
+      return ttl;
+    }
+
+    if (!ttl) {
+      return maxTtl;
+    }
+
+    const ttlMs = parseNaturalLanguageDurationInMs(ttl);
+    const maxTtlMs = parseNaturalLanguageDurationInMs(maxTtl);
+
+    if (ttlMs === undefined || maxTtlMs === undefined) {
+      return ttl;
+    }
+
+    return ttlMs <= maxTtlMs ? ttl : maxTtl;
+  }


⚠️ Potential issue | 🟡 Minor

#resolveMaxTtl — edge case when TTL string is unparseable.

If the user-provided ttl doesn't parse (e.g., "invalid"), parseNaturalLanguageDurationInMs returns undefined, and Line 2296 returns the original unparseable ttl string. This means an invalid TTL silently bypasses the max clamp. This is presumably handled downstream (the run queue / expiry system would ignore or reject an invalid TTL), but it's worth noting.

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/engine/index.ts` around lines 2275 - 2300, The method `#resolveMaxTtl` currently returns the original unparseable ttl string when parseNaturalLanguageDurationInMs(ttl) yields undefined, which allows invalid TTLs to bypass the max clamp; change the logic so that if ttl fails to parse but the configured options.defaultMaxTtl parses successfully, you return the parsed max (i.e., maxTtl) to enforce the cap; keep the existing fallback that if defaultMaxTtl itself is unparseable (parseNaturalLanguageDurationInMs(maxTtl) === undefined) you return ttl as before.

coderabbitai · 2026-02-09T22:13:15Z

internal-packages/run-engine/src/engine/systems/ttlSystem.ts

+        // Update all runs in a single batch
+        const now = new Date();
+        const runIdsToExpire = runsToExpire.map((r) => r.id);
+
+        await this.$.prisma.taskRun.updateMany({
+          where: { id: { in: runIdsToExpire } },
+          data: {
+            status: "EXPIRED" as TaskRunStatus,
+            completedAt: now,
+            expiredAt: now,
+            // Note: updateMany doesn't support nested writes, so we handle error and snapshots separately
+          },
+        });
+
+        // Create snapshots and set errors for each run (these require individual updates)
+        await Promise.all(
+          runsToExpire.map(async (run) => {
+            const latestSnapshot = run.executionSnapshots[0]!;
+            const error: TaskRunError = {
+              type: "STRING_ERROR",
+              raw: `Run expired because the TTL (${run.ttl}) was reached`,
+            };
+
+            // Update the error field (updateMany can't do JSON fields properly)
+            await this.$.prisma.taskRun.update({
+              where: { id: run.id },
+              data: { error },
+            });
+
+            // Create the snapshot
+            await this.$.prisma.taskRunExecutionSnapshot.create({
+              data: {
+                runId: run.id,
+                engine: "V2",
+                executionStatus: "FINISHED",
+                description: "Run was expired because the TTL was reached",
+                runStatus: "EXPIRED",
+                environmentId: latestSnapshot.environmentId,
+                environmentType: latestSnapshot.environmentType,
+                projectId: latestSnapshot.projectId,
+                organizationId: latestSnapshot.organizationId,
+              },
+            });
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for TTL batch usage, expiration handling, and any repair/backfill flows rg -n "expireRunsBatch|runExpired|expiredAt|EXPIRED|ttlSystem" internal-packages/run-engine/src -g '*.ts' -A 2

Repository: triggerdotdev/trigger.dev

Length of output: 20150

🏁 Script executed:

# Look for any error handling or retry logic around the updateMany operation rg -n "\.updateMany|\.update|\.create" internal-packages/run-engine/src/engine/systems/ttlSystem.ts -B 2 -A 5

Repository: triggerdotdev/trigger.dev

Length of output: 1526

🏁 Script executed:

# Search for TaskRun model definition and any constraints fd -name "*.prisma" | xargs rg -n "model TaskRun|model TaskRunExecutionSnapshot" -A 15

Repository: triggerdotdev/trigger.dev

Length of output: 12802

🏁 Script executed:

# Check if there are any monitor/repair jobs related to EXPIRED runs rg -rn "EXPIRED.*repair|repair.*EXPIRED|orphan|partial.*expir|backfill.*ttl" internal-packages/run-engine/src -g '*.ts'

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Understand the full context of the ttlSystem.ts file to see how expireRunsBatch is called head -n 350 internal-packages/run-engine/src/engine/systems/ttlSystem.ts | tail -n 150

Repository: triggerdotdev/trigger.dev

Length of output: 4969

🏁 Script executed:

# Search for orphan handling, repair jobs, or post-expiration validation rg -i "orphan|repair|backfill|expired.*without|validate.*expired" internal-packages/run-engine/src -g '*.ts'

Repository: triggerdotdev/trigger.dev

Length of output: 3287

🏁 Script executed:

# Check if there's error handling or retry logic around expireRunsBatch calls rg -B 5 -A 10 "expireRunsBatch" internal-packages/run-engine/src -g '*.ts'

Repository: triggerdotdev/trigger.dev

Length of output: 14334

🏁 Script executed:

# Check the error handling and logging around the Promise.all rg -B 5 -A 15 "Promise.all.*runsToExpire.map" internal-packages/run-engine/src/engine/systems/ttlSystem.ts

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check if there are any queries that assume EXPIRED runs have snapshots rg -i "EXPIRED.*snapshot|snapshot.*EXPIRED" internal-packages/run-engine/src -g '*.ts' -A 2

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Look for any error recovery or monitoring for failed TTL operations rg -B 2 -A 10 "catch.*callbackError\|catch.*expireRunsBatch" internal-packages/run-engine/src -g '*.ts'

Repository: triggerdotdev/trigger.dev

Length of output: 51

🏁 Script executed:

# Check error handling in ttlSystem more carefully sed -n '265,330p' internal-packages/run-engine/src/engine/systems/ttlSystem.ts

Repository: triggerdotdev/trigger.dev

Length of output: 2361

🏁 Script executed:

# Check how the TTL callback is invoked and what happens on error sed -n '2240,2280p' internal-packages/run-engine/src/engine/index.ts

Repository: triggerdotdev/trigger.dev

Length of output: 1315

🏁 Script executed:

# Look for any validation or assumptions about EXPIRED runs having snapshots grep -r "executionSnapshots" internal-packages/run-engine/src -g --include="*.ts" | grep -i expired | head -20

Repository: triggerdotdev/trigger.dev

Length of output: 179

Risk of partial-expire state if per-run updates fail after batch status update.

The updateMany operation marks all runs as EXPIRED atomically, but the subsequent Promise.all that creates snapshots, sets errors, and emits events can fail partially. If any per-run operation fails (snapshot creation, waitpoint completion, event emission), the Promise.all rejects—but the runs are already marked EXPIRED in the database. Since these runs no longer match the PENDING filter, they won't be retried on the next TTL poll, leaving them stuck in an incomplete state without snapshots or errors.

While the batch size is bounded at 100 runs (default), this doesn't prevent the issue. The error handling in #ttlExpiredCallback only logs failures without recovery. Consider using per-run transactional updates via prisma.$transaction() to atomically update status, error, and snapshot together for each run, ensuring consistency.

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/engine/systems/ttlSystem.ts` around lines 248 - 291, The batch update (taskRun.updateMany) followed by per-run updates (taskRun.update and taskRunExecutionSnapshot.create) can leave runs marked EXPIRED without snapshots/errors if any per-run op fails; change ttlExpiredCallback to perform each run's status change, error write, and snapshot creation inside a single per-run prisma.$transaction() (instead of the global updateMany + Promise.all) so that setting status to EXPIRED, completedAt/expiredAt, writing the error, and creating taskRunExecutionSnapshot happen atomically for each run (operate on runsToExpire entries and replace uses of taskRun.updateMany, taskRun.update, and taskRunExecutionSnapshot.create accordingly), and ensure transaction failures are surfaced/logged so those runs remain PENDING for the next TTL poll.

coderabbitai · 2026-02-09T22:13:15Z

internal-packages/run-engine/src/run-queue/index.ts

    if messagePayload then
-        -- Update concurrency
-        redis.call('ZREM', queueKey, messageId)
-        redis.call('ZREM', envQueueKey, messageId)
-        redis.call('SADD', queueCurrentConcurrencyKey, messageId)
-        redis.call('SADD', envCurrentConcurrencyKey, messageId)
-
-        -- Add to results
-        table.insert(results, messageId)
-        table.insert(results, messageScore)
-        table.insert(results, messagePayload)
-
-        dequeuedCount = dequeuedCount + 1
+        -- Parse the message to check for TTL expiration
+        local messageData = cjson.decode(messagePayload)
+        local ttlExpiresAt = messageData and messageData.ttlExpiresAt
+
+        -- Check if TTL has expired
+        if ttlExpiresAt and ttlExpiresAt <= currentTime then
+            -- TTL expired - remove from queues but don't add to results
+            redis.call('ZREM', queueKey, messageId)
+            redis.call('ZREM', envQueueKey, messageId)
+            redis.call('DEL', messageKey)
+
+            -- Remove from TTL set if provided
+            if ttlQueueKey and ttlQueueKey ~= '' then
+                -- Construct TTL member: queueKey|runId|orgId
+                local ttlMember = queueName .. '|' .. messageId .. '|' .. (messageData.orgId or '')
+                redis.call('ZREM', ttlQueueKey, ttlMember)
+            end
+            -- Don't add to results - this run is expired
+        else
+            -- Not expired - process normally
+            redis.call('ZREM', queueKey, messageId)
+            redis.call('ZREM', envQueueKey, messageId)
+            redis.call('SADD', queueCurrentConcurrencyKey, messageId)
+            redis.call('SADD', envCurrentConcurrencyKey, messageId)
+
+            -- Remove from TTL set if provided (run is being executed, not expired)
+            if ttlQueueKey and ttlQueueKey ~= '' and ttlExpiresAt then
+                local ttlMember = queueName .. '|' .. messageId .. '|' .. (messageData.orgId or '')
+                redis.call('ZREM', ttlQueueKey, ttlMember)
+            end
+
+            -- Add to results
+            table.insert(results, messageId)
+            table.insert(results, messageScore)
+            table.insert(results, messagePayload)
+
+            dequeuedCount = dequeuedCount + 1
+        end


⚠️ Potential issue | 🔴 Critical

TTL-expired runs silently discarded during dequeue are never reported to the application layer.

When the dequeue Lua script encounters a message with an expired TTL (Lines 2716–2727), it removes the message from the queue, env queue, message key, and the TTL sorted set — but never returns it as an expired run. This means:

The TTL consumer will never pick it up (already removed from the TTL sorted set).

The ttlSystem.callback is never invoked for this run.

The run remains in QUEUED status in the database permanently — orphaned.

The dequeue path needs to either:

Return expired run info separately so the caller can invoke the expiration callback, or

Leave expired messages in the TTL sorted set (only remove from the normal queue/env queue) so the TTL consumer can handle them, or

Not evict TTL-expired messages during dequeue at all, and let the TTL consumer be the sole owner of expiration.

Option 3 is simplest and avoids the race between dequeue and TTL consumer entirely. The TTL consumer already handles cleanup atomically.

🤖 Prompt for AI Agents

In `@internal-packages/run-engine/src/run-queue/index.ts` around lines 2710 - 2748, The dequeue Lua branch that handles TTL expiration (checks using ttlExpiresAt <= currentTime) currently removes the message from queueKey, envQueueKey, messageKey and ttlQueueKey but never reports the expired run to the application; to fix this, stop evicting TTL-expired messages in the dequeue path: modify the ttlExpiresAt <= currentTime branch so it does not remove the message or its TTL membership (i.e., do not call ZREM on queueKey/envQueueKey or DEL messageKey or ZREM ttlQueueKey) and simply skip adding the run to results (leaving the message for the TTL consumer to handle), ensuring that results, dequeuedCount and SADD calls are untouched for the non-expired branch; locate and change the code around the symbols ttlExpiresAt, currentTime, queueKey, envQueueKey, ttlQueueKey, messageKey, results and dequeuedCount.

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

apps/webapp/app/presenters/v3/LimitsPresenter.server.ts Show resolved Hide resolved

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

ericallam force-pushed the ea-branch-117 branch from a50a5f5 to aaea8d6 Compare February 5, 2026 17:32

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

ericallam changed the title ~~feat(dashboard): Display environment queue length limits on queues and limits page~~ feat(server): New TTL system, enforce max queue length limits, lazy waitpoint creation Feb 5, 2026

ericallam added 5 commits February 9, 2026 15:37

feat(dashboard): Display environment queue length limits on queues an…

b9b0664

…d limits page

Make it clear the limit is across all queues in the env

60717a4

A couple of devin improvements and adding an in memory cache for the …

6369faf

…env queue size check

Add queue length limits at the queue level, lazy waitpoint creation, …

965721c

…new ttl system

Failed batch queue processing now creates a pre-failed run, better ha…

21dae6f

…ndles failures from queue length limit failures and also retries

ericallam force-pushed the ea-branch-117 branch from aaea8d6 to 21dae6f Compare February 9, 2026 15:38

introduce maximum ttl via the RUN_ENGINE_DEFAULT_MAX_TTL optional env…

814a9a4

… var, set at engine level

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

Uh oh!

Conversation

ericallam commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Environment Queue Limits

Fix: batchTriggerAndWait stuck when items skipped

New TTL System

Lazy Waitpoint Creation

Run Queue Improvements

Test plan

Uh oh!

changeset-bot bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

Uh oh!

vibe-kanban-cloud bot commented Jan 30, 2026

Review Complete

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

ericallam commented Jan 30, 2026 •

edited

Loading

changeset-bot bot commented Jan 30, 2026 •

edited

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading