feat(server): New TTL system, enforce max queue length limits, lazy waitpoint creation#2980
feat(server): New TTL system, enforce max queue length limits, lazy waitpoint creation#2980
Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughCentralizes queue-size logic (new v3/queueLimits utility and environment queueSizeLimit exposure) and adds an LRU cache for environment queue lengths. Refactors queue validation to per-queue semantics (resolveQueueNamesForBatchItems, validateMultipleQueueLimits) and surfaces itemsSkipped/runCount through batch streaming APIs. Introduces per-item retry for batch queue processing, batch-run-count updates, and a TriggerFailedTaskService for creating pre-failed runs. Adds a TTL expiration subsystem (batched TTL consumers, Redis TTL scripts, ttlSystem callback) and lazy get-or-create waitpoints with related waitpoint APIs. Numerous RunEngine/RunQueue/BatchQueue public API additions and tests updated; UI presenters and routes updated to use the single queueSize quota. Estimated code review effort🎯 5 (Critical) | ⏱️ ~180 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review CompleteYour review story is ready! Comment !reviewfast on this PR to re-generate the story. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/webapp/app/presenters/v3/LimitsPresenter.server.ts`:
- Around line 170-194: The code silently treats a missing runtime environment as
a zero queue by checking runtimeEnv and continuing; change this to explicitly
handle the case where runtimeEnv is null for the required environmentId by
returning/throwing an error (or returning a clear "not found" result) instead of
falling through to the deployed default logic—locate the block where runtimeEnv
is fetched (the runtimeEnvironment.findFirst call and the subsequent use of
runtimeEnv, currentQueueSize and engine.lengthOfEnvQueue) and add an explicit
branch that raises a descriptive error or returns an appropriate 404-style
response when runtimeEnv is falsy so callers don’t get misleading limits
computed for a non-existent environment.
🧹 Nitpick comments (1)
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)
311-329: Queue size limit logic is duplicated withEnvironmentQueuePresenter.The limit computation logic (lines 314-317) and source determination (lines 321-328) duplicate the same pattern found in
EnvironmentQueuePresenter.server.ts(lines 44-47). Consider extracting this into a shared utility function to ensure consistency and reduce maintenance burden.♻️ Example shared utility
// In a shared utils file, e.g., ~/utils/queueSizeLimit.ts export function getQueueSizeLimitForEnvironment( environmentType: string, organization: { maximumDevQueueSize: number | null; maximumDeployedQueueSize: number | null }, defaultDevLimit: number | null, defaultDeployedLimit: number | null ): { limit: number | null; source: "plan" | "default" } { if (environmentType === "DEVELOPMENT") { return { limit: organization.maximumDevQueueSize ?? defaultDevLimit ?? null, source: organization.maximumDevQueueSize ? "plan" : "default", }; } return { limit: organization.maximumDeployedQueueSize ?? defaultDeployedLimit ?? null, source: organization.maximumDeployedQueueSize ? "plan" : "default", }; }
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access all environment variables through the
envexport ofenv.server.tsinstead of directly accessingprocess.envin the Trigger.dev webapp
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: When importing from@trigger.dev/corein the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webappAccess environment variables via
envexport fromapps/webapp/app/env.server.ts, never useprocess.envdirectly
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
🧠 Learnings (10)
📚 Learning: 2025-12-08T15:19:56.823Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2760
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx:278-281
Timestamp: 2025-12-08T15:19:56.823Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx, the tableState search parameter uses intentional double-encoding: the parameter value contains a URL-encoded URLSearchParams string, so decodeURIComponent(value("tableState") ?? "") is required to fully decode it before parsing with new URLSearchParams(). This pattern allows bundling multiple filter/pagination params as a single search parameter.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/v3/presenters/**/*.server.{ts,tsx} : Organize presenters in the webapp following the pattern `app/v3/presenters/*/*.server.ts` to move complex loader code into classes
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : Access environment variables via `env` export from `apps/webapp/app/env.server.ts`, never use `process.env` directly
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to apps/webapp/**/*.test.{ts,tsx} : For testable code in the webapp, never import env.server.ts in test files - pass configuration as options instead
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/**/*.test.{ts,tsx} : Test files should only import classes and functions from `app/**/*.ts` files and should not import `env.server.ts` directly or indirectly; pass configuration through options instead
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
📚 Learning: 2025-11-14T16:03:06.917Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2026-01-12T17:18:09.451Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.
Applied to files:
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine
Applied to files:
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
🧬 Code graph analysis (2)
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx (2)
apps/webapp/app/utils/numberFormatter.ts (1)
formatNumberCompact(3-5)apps/webapp/app/components/primitives/Tooltip.tsx (1)
InfoIconTooltip(110-139)
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)
apps/webapp/app/v3/runEngine.server.ts (1)
engine(11-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: typecheck / typecheck
🔇 Additional comments (5)
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts (1)
1-56: LGTM!The implementation correctly extends the Environment type with
queueSizeLimitand computes it based on environment type with appropriate fallback chain (org override → env var → null). Environment variable access follows coding guidelines via~/env.server.apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)
510-511: LGTM!Clean consolidation from separate dev/deployed queue size quotas to a unified
queueSizequota, conditionally rendered only when a limit is set. This aligns well with theLimitsPresenterchanges.apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx (2)
349-391: LGTM!The queue size limit display with color-coded usage indicators is well-implemented. The conditional rendering shows the limit suffix when
queueSizeLimitexists, with appropriate fallback to the paused indicator.Minor note: The component has
envfromuseEnvironment()(line 313) shadowing theenvimport from~/env.server(line 57). This works correctly since the loader uses the importedenvand the component uses the hook result, but renaming one variable (e.g.,runtimeEnvfor the hook result) could improve clarity.
1145-1152: LGTM!Clean helper function with consistent threshold logic (90% warning, 100% error). Returning
undefinedwhen no limit is set allows proper fallback behavior in the calling code.apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)
15-15: LGTM on the structural changes.The import of
engineand the updatedLimitsResulttype with consolidatedqueueSize: QuotaInfoproperly support the new unified queue size quota feature.Also applies to: 70-70
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@apps/webapp/app/env.server.ts`:
- Around line 536-539: The MAXIMUM_DEV_QUEUE_SIZE default of 500 introduces an
unintended hard cap for dev environments because guardQueueSizeLimitsForEnv
reads this value and enforces it; remove the .default(500) so
MAXIMUM_DEV_QUEUE_SIZE remains optional/undefined unless explicitly set in the
environment, leaving the z.coerce.number().int().optional() schema for
MAXIMUM_DEV_QUEUE_SIZE and ensure any code calling guardQueueSizeLimitsForEnv
continues to treat undefined as "no cap" (verify guardQueueSizeLimitsForEnv
behavior and update it only if it currently treats undefined incorrectly).
In `@apps/webapp/app/presenters/v3/LimitsPresenter.server.ts`:
- Around line 313-319: The queue-size quota object (queueSize) in
LimitsPresenter.server.ts is missing the isUpgradable flag so the UI never shows
an Upgrade action; update the queueSize payload returned by the presenter to
include isUpgradable: true for plan-based limits (and add canExceed: true|false
as appropriate), e.g., set isUpgradable to true when
getQueueSizeLimitSource(environmentType, organization) indicates a plan-based
source and ensure the UI-facing fields (queueSize.currentUsage, queueSize.limit,
queueSize.source) remain unchanged.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
apps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/app/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access all environment variables through the
envexport ofenv.server.tsinstead of directly accessingprocess.envin the Trigger.dev webapp
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: When importing from@trigger.dev/corein the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webappAccess environment variables via
envexport fromapps/webapp/app/env.server.ts, never useprocess.envdirectly
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/v3/utils/queueLimits.server.ts
📚 Learning: 2025-11-14T16:03:06.917Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.
Applied to files:
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-12-08T15:19:56.823Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2760
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx:278-281
Timestamp: 2025-12-08T15:19:56.823Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx, the tableState search parameter uses intentional double-encoding: the parameter value contains a URL-encoded URLSearchParams string, so decodeURIComponent(value("tableState") ?? "") is required to fully decode it before parsing with new URLSearchParams(). This pattern allows bundling multiple filter/pagination params as a single search parameter.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties
Applied to files:
apps/webapp/app/runEngine/concerns/queues.server.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp
Applied to files:
apps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Specify runtime environment (node or bun) in trigger.config.ts using the `runtime` property
Applied to files:
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2026-01-12T17:18:09.451Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.
Applied to files:
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
🧬 Code graph analysis (2)
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts (1)
apps/webapp/app/v3/utils/queueLimits.server.ts (1)
getQueueSizeLimit(24-33)
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (2)
apps/webapp/app/v3/runEngine.server.ts (1)
engine(11-11)apps/webapp/app/v3/utils/queueLimits.server.ts (2)
getQueueSizeLimit(24-33)getQueueSizeLimitSource(42-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: typecheck / typecheck
🔇 Additional comments (7)
apps/webapp/app/runEngine/concerns/queues.server.ts (3)
18-33: LRU cache setup looks solid.The singleton + bounded LRU store should help reduce Redis churn without unbounded memory growth.
298-317: Confirm stale cache is acceptable for enforcement.
guardQueueSizeLimitsForEnvnow relies on an SWR cache; with a ~30s TTL, queue size can be undercounted and allow bursts above the limit. If this is a hard/billing limit, consider bypassing the cache for enforcement (or forcing a fresh read when nearing the limit).
68-68: No issues to flag in these edits.Also applies to: 94-96, 116-118, 130-131
apps/webapp/app/v3/utils/queueLimits.server.ts (1)
7-51: Queue-limit helpers look good.Clear resolution order and source labeling; the API surface is minimal and consistent.
apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts (1)
4-52:queueSizeLimitwiring looks consistent.The new field is derived in one place and keeps the presenter output cohesive.
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (1)
2-18: API/type expansion for queue limits looks consistent.The new environmentType parameter and consolidated quota surface align with the updated limit logic.
Also applies to: 58-96
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)
81-86: Loader and quota table updates look good.Passing
environmentTypeand gating the queue-size quota on a defined limit is consistent with the new presenter shape.Also applies to: 511-513
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional().default(500), | ||
| MAXIMUM_DEPLOYED_QUEUE_SIZE: z.coerce.number().int().optional(), | ||
| QUEUE_SIZE_CACHE_TTL_MS: z.coerce.number().int().optional().default(30_000), // 30 seconds | ||
| QUEUE_SIZE_CACHE_MAX_SIZE: z.coerce.number().int().optional().default(5_000), |
There was a problem hiding this comment.
Defaulting MAXIMUM_DEV_QUEUE_SIZE changes enforcement behavior.
This turns previously-unlimited dev environments into a hard 500-queue cap (via guardQueueSizeLimitsForEnv). If that’s not intentional, remove the default and require an explicit env var to enable the limit.
💡 Suggested change (avoid unintended hard limit)
- MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional().default(500),
+ MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional().default(500), | |
| MAXIMUM_DEPLOYED_QUEUE_SIZE: z.coerce.number().int().optional(), | |
| QUEUE_SIZE_CACHE_TTL_MS: z.coerce.number().int().optional().default(30_000), // 30 seconds | |
| QUEUE_SIZE_CACHE_MAX_SIZE: z.coerce.number().int().optional().default(5_000), | |
| MAXIMUM_DEV_QUEUE_SIZE: z.coerce.number().int().optional(), | |
| MAXIMUM_DEPLOYED_QUEUE_SIZE: z.coerce.number().int().optional(), | |
| QUEUE_SIZE_CACHE_TTL_MS: z.coerce.number().int().optional().default(30_000), // 30 seconds | |
| QUEUE_SIZE_CACHE_MAX_SIZE: z.coerce.number().int().optional().default(5_000), |
🤖 Prompt for AI Agents
In `@apps/webapp/app/env.server.ts` around lines 536 - 539, The
MAXIMUM_DEV_QUEUE_SIZE default of 500 introduces an unintended hard cap for dev
environments because guardQueueSizeLimitsForEnv reads this value and enforces
it; remove the .default(500) so MAXIMUM_DEV_QUEUE_SIZE remains
optional/undefined unless explicitly set in the environment, leaving the
z.coerce.number().int().optional() schema for MAXIMUM_DEV_QUEUE_SIZE and ensure
any code calling guardQueueSizeLimitsForEnv continues to treat undefined as "no
cap" (verify guardQueueSizeLimitsForEnv behavior and update it only if it
currently treats undefined incorrectly).
| queueSize: { | ||
| name: "Max queued runs", | ||
| description: "Maximum pending runs across all queues in this environment", | ||
| limit: getQueueSizeLimit(environmentType, organization), | ||
| currentUsage: currentQueueSize, | ||
| source: getQueueSizeLimitSource(environmentType, organization), | ||
| }, |
There was a problem hiding this comment.
Queue-size quota never shows an upgrade action.
isUpgradable is omitted, so the Upgrade column is empty even for plan-based limits. If upgrades should be offered, set it to true (and optionally canExceed).
💡 Suggested change
queueSize: {
name: "Max queued runs",
description: "Maximum pending runs across all queues in this environment",
limit: getQueueSizeLimit(environmentType, organization),
currentUsage: currentQueueSize,
source: getQueueSizeLimitSource(environmentType, organization),
+ isUpgradable: true,
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| queueSize: { | |
| name: "Max queued runs", | |
| description: "Maximum pending runs across all queues in this environment", | |
| limit: getQueueSizeLimit(environmentType, organization), | |
| currentUsage: currentQueueSize, | |
| source: getQueueSizeLimitSource(environmentType, organization), | |
| }, | |
| queueSize: { | |
| name: "Max queued runs", | |
| description: "Maximum pending runs across all queues in this environment", | |
| limit: getQueueSizeLimit(environmentType, organization), | |
| currentUsage: currentQueueSize, | |
| source: getQueueSizeLimitSource(environmentType, organization), | |
| isUpgradable: true, | |
| }, |
🤖 Prompt for AI Agents
In `@apps/webapp/app/presenters/v3/LimitsPresenter.server.ts` around lines 313 -
319, The queue-size quota object (queueSize) in LimitsPresenter.server.ts is
missing the isUpgradable flag so the UI never shows an Upgrade action; update
the queueSize payload returned by the presenter to include isUpgradable: true
for plan-based limits (and add canExceed: true|false as appropriate), e.g., set
isUpgradable to true when getQueueSizeLimitSource(environmentType, organization)
indicates a plan-based source and ensure the UI-facing fields
(queueSize.currentUsage, queueSize.limit, queueSize.source) remain unchanged.
a50a5f5 to
aaea8d6
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal-packages/run-engine/src/engine/tests/ttl.test.ts (1)
1-9:⚠️ Potential issue | 🟡 MinorMissing
viimport from vitest.Similar to lazyWaitpoint.test.ts, the
viglobal is used but not imported.🔧 Proposed fix
import { containerTest, assertNonNullable } from "@internal/testcontainers"; import { trace } from "@internal/tracing"; -import { expect } from "vitest"; +import { expect, vi } from "vitest"; import { RunEngine } from "../index.js";apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)
5-5:⚠️ Potential issue | 🟡 MinorUse subpath export for
tryCatchfrom@trigger.dev/core/utils.Per coding guidelines, imports from
@trigger.dev/corein the webapp must use subpath exports from the package.json. Changeimport { tryCatch } from "@trigger.dev/core";toimport { tryCatch } from "@trigger.dev/core/utils";.
🤖 Fix all issues with AI agents
In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts`:
- Around line 112-130: The updateRunCount method currently trusts newRunCount;
add input validation in updateRunCount (in completionTracker.ts) to reject
negative, NaN, non-finite or non-integer values before calling
getMeta/storeMeta: use Number.isInteger(newRunCount) && newRunCount >= 0 (and
optionally Number.isFinite) and if the check fails log an error via
this.logger.error with batchId and the invalid value and then throw an Error or
return a rejected Promise so the invalid value is never persisted; keep existing
behavior for meta retrieval (getMeta) and storage (storeMeta) otherwise.
In `@internal-packages/run-engine/src/batch-queue/index.ts`:
- Around line 380-386: The updateRunCount method can lower runCount below the
already-processed count and leave a batch stuck; before calling
this.completionTracker.updateRunCount(batchId, newRunCount) clamp newRunCount to
at least the current processed count (e.g. newRunCount = Math.max(newRunCount,
this.completionTracker.getProcessedCount(batchId))) or, alternatively, after
updating call into the completion tracker to run the completion/finalization
check (e.g. this.completionTracker.checkCompletion/finalizeBatch for batchId) so
that if processedCount >= runCount the batch is finalized immediately; update
updateRunCount to perform one of these fixes using the existing
completionTracker APIs.
In `@internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts`:
- Around line 1-8: The test uses the vitest runner's mock/time API via
vi.setConfig but never imports vi; update the imports to explicitly import vi
from 'vitest' (for example change the existing import of expect to import {
expect, vi } from "vitest") so the vi symbol used by vi.setConfig is defined and
the test runner APIs are available.
In `@internal-packages/run-engine/src/run-queue/index.ts`:
- Around line 1238-1304: The code currently calls `#expireTtlRuns`(shard, now,
batchSize) which deletes/marks runs as expired before invoking
options.ttlSystem!.callback(expiredRuns), so callback failures only get logged
and those runs can be lost; update `#startTtlConsumer` (or `#expireTtlRuns`) so that
expiration is durable: either (A) defer permanent deletion/marking of runs until
after the callback completes successfully (move the delete/mark logic to after
options.ttlSystem!.callback returns) or (B) if deletion must happen first, add a
durable retry/enqueue path when the callback throws — e.g., push failed expired
run IDs to a retry store (Redis list/sorted set) with metadata and backoff, and
spawn a retry worker to reprocess them; ensure the callback and retry path are
idempotent and update logging to include run IDs and shard when queuing retries.
- Around line 2561-2637: expireTtlRuns currently removes run entries from
per-queue structures but never updates the master queue (masterQueueKey),
leaving stale entries; update the logic so that either (A) the Lua script also
computes the corresponding master/shard key from queueKey and ZREM the runId
from masterQueueKey (or adjust scores) for each expired member, or (B) return
the affected queueKey values (not just member) from expireTtlRuns so the
TypeScript caller can call the existing rebalance/cleanup routine for
masterQueueKey per returned queue; locate expireTtlRuns, the queueKey local
variable and the final results/return section to implement one of these fixes.
🧹 Nitpick comments (7)
packages/core/src/v3/schemas/api.ts (1)
412-424: Tighten validation for new count fields.
These are counts, so restricting them to non‑negative integers prevents accidental floats/negatives in API responses.♻️ Suggested update
- itemsSkipped: z.number().optional(), + itemsSkipped: z.number().int().nonnegative().optional(), ... - runCount: z.number().optional(), + runCount: z.number().int().nonnegative().optional(),apps/webapp/app/runEngine/concerns/queues.server.ts (1)
421-432: Consider potential cache stampede scenario.The SWR pattern handles stale reads well, but when the cache is completely cold (e.g., after deployment), multiple concurrent requests for the same queue could all trigger Redis lookups simultaneously before the first one populates the cache.
For most use cases this is acceptable, but if queue size lookups become a bottleneck, consider adding a coalescing mechanism.
apps/webapp/app/runEngine/services/triggerTask.server.ts (1)
258-269: Queue validation correctly uses resolved queue name.The reordering to validate after
resolveQueuePropertiesis correct since the queue name is needed for per-queue validation.Consider enhancing the error message to include the queue name for better debuggability:
💡 Optional: Include queue name in error message
if (!queueSizeGuard.ok) { throw new ServiceValidationError( - `Cannot trigger ${taskId} as the queue size limit for this environment has been reached. The maximum size is ${queueSizeGuard.maximumSize}` + `Cannot trigger ${taskId} as the queue size limit for queue '${queueName}' has been reached. The maximum size is ${queueSizeGuard.maximumSize}` ); }internal-packages/run-engine/src/run-queue/types.ts (1)
56-128: ConvertRunQueueKeyProducerinterface to a type alias to align with TypeScript guidelines.♻️ Suggested refactor
-export interface RunQueueKeyProducer { +export type RunQueueKeyProducer = { //queue queueKey( orgId: string, projId: string, envId: string, queue: string, concurrencyKey?: string ): string; queueKey(env: RunQueueKeyProducerEnvironment, queue: string, concurrencyKey?: string): string; legacyMasterQueueKey(masterQueueName: string): string; masterQueueKeyForEnvironment(envId: string, shardCount: number): string; masterQueueKeyForShard(shard: number): string; masterQueueShardForEnvironment(envId: string, shardCount: number): number; workerQueueKey(workerQueue: string): string; envQueueKey(env: RunQueueKeyProducerEnvironment): string; envQueueKeyFromQueue(queue: string): string; queueConcurrencyLimitKey(env: RunQueueKeyProducerEnvironment, queue: string): string; queueConcurrencyLimitKeyFromQueue(queue: string): string; queueCurrentConcurrencyKeyFromQueue(queue: string): string; queueCurrentConcurrencyKey( env: RunQueueKeyProducerEnvironment, queue: string, concurrencyKey?: string ): string; queueCurrentDequeuedKeyFromQueue(queue: string): string; queueCurrentDequeuedKey( env: RunQueueKeyProducerEnvironment, queue: string, concurrencyKey?: string ): string; //env oncurrency envCurrentConcurrencyKey(env: EnvDescriptor): string; envCurrentConcurrencyKey(env: RunQueueKeyProducerEnvironment): string; envConcurrencyLimitKey(env: EnvDescriptor): string; envConcurrencyLimitKey(env: RunQueueKeyProducerEnvironment): string; envCurrentDequeuedKey(env: EnvDescriptor): string; envCurrentDequeuedKey(env: RunQueueKeyProducerEnvironment): string; envConcurrencyLimitBurstFactorKey(env: EnvDescriptor): string; envConcurrencyLimitBurstFactorKey(env: RunQueueKeyProducerEnvironment): string; envConcurrencyLimitBurstFactorKeyFromQueue(queue: string): string; envConcurrencyLimitKeyFromQueue(queue: string): string; envCurrentConcurrencyKeyFromQueue(queue: string): string; envCurrentDequeuedKeyFromQueue(queue: string): string; //message payload messageKeyPrefixFromQueue(queue: string): string; messageKey(orgId: string, messageId: string): string; //utils orgIdFromQueue(queue: string): string; envIdFromQueue(queue: string): string; projectIdFromQueue(queue: string): string; descriptorFromQueue(queue: string): QueueDescriptor; deadLetterQueueKey(env: RunQueueKeyProducerEnvironment): string; deadLetterQueueKey(env: EnvDescriptor): string; deadLetterQueueKeyFromQueue(queue: string): string; // Concurrency sweeper methods markedForAckKey(): string; currentConcurrencySetKeyScanPattern(): string; // TTL system methods ttlQueueKeyForShard(shard: number): string; -} +};internal-packages/run-engine/src/engine/systems/ttlSystem.ts (2)
252-260: Unnecessary type cast for TaskRunStatus.The string literal
"EXPIRED"is a validTaskRunStatusvalue, so theas TaskRunStatuscast is redundant.🧹 Minor cleanup
await this.$.prisma.taskRun.updateMany({ where: { id: { in: runIdsToExpire } }, data: { - status: "EXPIRED" as TaskRunStatus, + status: "EXPIRED", completedAt: now, expiredAt: now, }, });
263-321: Consider error handling for individual run processing within the batch.If any single run fails during snapshot creation or waitpoint completion,
Promise.allwill reject and subsequent runs won't be processed. Since the batchupdateManyon line 252 has already succeeded, this could leave some runs in a partially processed state (status=EXPIRED but no snapshot).For improved resilience, consider using
Promise.allSettledand collecting failures:💡 Suggested approach
- await Promise.all( - runsToExpire.map(async (run) => { + const results = await Promise.allSettled( + runsToExpire.map(async (run) => { // ... existing processing ... expired.push(run.id); + return run.id; }) ); + + // Log any failures but don't throw - the status update already succeeded + for (const result of results) { + if (result.status === "rejected") { + this.$.logger.error("Failed to fully process expired run", { + error: result.reason, + }); + } + }internal-packages/run-engine/src/run-queue/index.ts (1)
108-110: Use a type alias for the callback.This project prefers
typeoverinterfacein TS; a function-shaped callback is a clean fit for a type alias.♻️ Suggested change
-export interface TtlSystemCallback { - (runs: Array<{ queueKey: string; runId: string; orgId: string }>): Promise<void>; -} +export type TtlSystemCallback = ( + runs: Array<{ queueKey: string; runId: string; orgId: string }> +) => Promise<void>;As per coding guidelines: Use types over interfaces for TypeScript.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
references/hello-world/src/trigger/batches.tsis excluded by!references/**
📒 Files selected for processing (32)
apps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsapps/webapp/app/runEngine/services/createBatch.server.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/v3/runEngine.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsinternal-packages/run-engine/src/batch-queue/completionTracker.tsinternal-packages/run-engine/src/batch-queue/index.tsinternal-packages/run-engine/src/engine/index.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.tsinternal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/types.tsinternal-packages/run-engine/src/run-queue/index.tsinternal-packages/run-engine/src/run-queue/keyProducer.tsinternal-packages/run-engine/src/run-queue/types.tspackages/core/src/v3/schemas/api.tspackages/trigger-sdk/src/v3/shared.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
- apps/webapp/app/presenters/v3/EnvironmentQueuePresenter.server.ts
🧰 Additional context used
📓 Path-based instructions (12)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
packages/core/src/v3/schemas/api.tsapps/webapp/app/runEngine/services/createBatch.server.tsinternal-packages/run-engine/src/run-queue/types.tsinternal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/types.tsapps/webapp/app/v3/utils/queueLimits.server.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsapps/webapp/app/v3/runEngine.server.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/run-queue/keyProducer.tsapps/webapp/app/env.server.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsinternal-packages/run-engine/src/batch-queue/index.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsapps/webapp/app/runEngine/types.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsapps/webapp/app/runEngine/concerns/queues.server.tsinternal-packages/run-engine/src/run-queue/index.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxinternal-packages/run-engine/src/engine/index.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
packages/core/src/v3/schemas/api.tsapps/webapp/app/runEngine/services/createBatch.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsapps/webapp/app/v3/runEngine.server.tsapps/webapp/app/env.server.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
packages/core/src/v3/schemas/api.tsapps/webapp/app/runEngine/services/createBatch.server.tsinternal-packages/run-engine/src/run-queue/types.tsinternal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/types.tsapps/webapp/app/v3/utils/queueLimits.server.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsapps/webapp/app/v3/runEngine.server.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/run-queue/keyProducer.tsapps/webapp/app/env.server.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsinternal-packages/run-engine/src/batch-queue/index.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsapps/webapp/app/runEngine/types.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsapps/webapp/app/runEngine/concerns/queues.server.tsinternal-packages/run-engine/src/run-queue/index.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxinternal-packages/run-engine/src/engine/index.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
packages/core/src/v3/schemas/api.tsapps/webapp/app/runEngine/services/createBatch.server.tsinternal-packages/run-engine/src/run-queue/types.tsinternal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/types.tsapps/webapp/app/v3/utils/queueLimits.server.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsapps/webapp/app/v3/runEngine.server.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/run-queue/keyProducer.tsapps/webapp/app/env.server.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsinternal-packages/run-engine/src/batch-queue/index.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsapps/webapp/app/runEngine/types.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsapps/webapp/app/runEngine/concerns/queues.server.tsinternal-packages/run-engine/src/run-queue/index.tsinternal-packages/run-engine/src/engine/index.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
packages/core/src/v3/schemas/api.tsapps/webapp/app/runEngine/services/createBatch.server.tsinternal-packages/run-engine/src/run-queue/types.tsinternal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/types.tsapps/webapp/app/v3/utils/queueLimits.server.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsapps/webapp/app/v3/runEngine.server.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/run-queue/keyProducer.tsapps/webapp/app/env.server.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/systems/runAttemptSystem.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsinternal-packages/run-engine/src/batch-queue/index.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsapps/webapp/app/runEngine/types.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsapps/webapp/app/runEngine/concerns/queues.server.tsinternal-packages/run-engine/src/run-queue/index.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsxinternal-packages/run-engine/src/engine/index.ts
{packages,integrations}/**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Add a changeset when modifying any public package in
packages/*orintegrations/*usingpnpm run changeset:add
Files:
packages/core/src/v3/schemas/api.tspackages/trigger-sdk/src/v3/shared.ts
apps/webapp/app/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access all environment variables through the
envexport ofenv.server.tsinstead of directly accessingprocess.envin the Trigger.dev webapp
Files:
apps/webapp/app/runEngine/services/createBatch.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsapps/webapp/app/v3/runEngine.server.tsapps/webapp/app/env.server.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: When importing from@trigger.dev/corein the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webappAccess environment variables via
envexport fromapps/webapp/app/env.server.ts, never useprocess.envdirectly
Files:
apps/webapp/app/runEngine/services/createBatch.server.tsapps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsapps/webapp/app/v3/runEngine.server.tsapps/webapp/app/env.server.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsapps/webapp/app/runEngine/concerns/queues.server.tsapps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
**/*.{test,spec}.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use vitest for all tests in the Trigger.dev repository
Files:
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
**/*.test.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.test.{ts,tsx,js,jsx}: Test files should live beside the files under test and use descriptivedescribeanditblocks
Tests should avoid mocks or stubs and use the helpers from@internal/testcontainerswhen Redis or Postgres are needed
Use vitest for running unit tests
**/*.test.{ts,tsx,js,jsx}: Use vitest exclusively for testing and never mock anything - use testcontainers instead
Place test files next to source files with naming pattern: source file (e.g.,MyService.ts) →MyService.test.ts
Files:
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use testcontainers helpers (
redisTest,postgresTest,containerTest) from@internal/testcontainersfor Redis/PostgreSQL testing instead of mocks
Files:
internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.ts
packages/trigger-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code
Files:
packages/trigger-sdk/src/v3/shared.ts
🧠 Learnings (40)
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeyTTL` option to define a time window during which duplicate triggers return the original run
Applied to files:
internal-packages/run-engine/src/run-queue/types.tsinternal-packages/run-engine/src/engine/types.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/run-queue/keyProducer.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/run-queue/index.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2026-01-12T17:18:09.451Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2870
File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66
Timestamp: 2026-01-12T17:18:09.451Z
Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.
Applied to files:
internal-packages/run-engine/src/run-queue/types.tsapps/webapp/app/v3/runEngine.server.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use metadata methods (set, del, replace, append, remove, increment, decrement, stream, flush) to update metadata during task execution
Applied to files:
internal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsapps/webapp/app/runEngine/services/triggerTask.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch
Applied to files:
internal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/batch-queue/index.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions
Applied to files:
internal-packages/run-engine/src/batch-queue/completionTracker.tspackages/trigger-sdk/src/v3/shared.tsapps/webapp/app/runEngine/services/triggerTask.server.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `.withStreams()` to subscribe to realtime streams from task metadata in addition to run changes
Applied to files:
packages/trigger-sdk/src/v3/shared.tsapps/webapp/app/runEngine/services/triggerTask.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Subscribe to run updates using `runs.subscribeToRun()` for realtime monitoring of task execution
Applied to files:
packages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task
Applied to files:
packages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads
Applied to files:
packages/trigger-sdk/src/v3/shared.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-11-27T16:26:37.432Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code
Applied to files:
packages/trigger-sdk/src/v3/shared.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code
Applied to files:
packages/trigger-sdk/src/v3/shared.tsapps/webapp/app/runEngine/services/streamBatchItems.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results
Applied to files:
packages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results
Applied to files:
packages/trigger-sdk/src/v3/shared.tsinternal-packages/run-engine/src/engine/systems/waitpointSystem.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-14T16:03:06.917Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.
Applied to files:
apps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option
Applied to files:
apps/webapp/app/v3/utils/queueLimits.server.tsapps/webapp/app/runEngine/services/triggerTask.server.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/concerns/queues.server.tsinternal-packages/run-engine/src/run-queue/index.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties
Applied to files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.tsapps/webapp/app/runEngine/services/triggerTask.server.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.tsapps/webapp/app/runEngine/concerns/queues.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.triggerAndWait()` to trigger a task and wait for its result from a parent task
Applied to files:
internal-packages/run-engine/src/engine/systems/waitpointSystem.tsinternal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-27T16:26:44.496Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/executing-commands.mdc:0-0
Timestamp: 2025-11-27T16:26:44.496Z
Learning: For running tests, navigate into the package directory and run `pnpm run test --run` to enable single-file test execution (e.g., `pnpm run test ./src/engine/tests/ttl.test.ts --run`)
Applied to files:
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsinternal-packages/run-engine/src/engine/tests/attemptFailures.test.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-27T16:26:37.432Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to **/*.{test,spec}.{ts,tsx} : Use vitest for all tests in the Trigger.dev repository
Applied to files:
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.ts
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to **/*.{ts,tsx} : Every Trigger.dev task must be exported and have a unique `id` property with no timeouts in the run function
Applied to files:
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsapps/webapp/app/runEngine/services/triggerTask.server.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-10-08T11:48:12.327Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.
Applied to files:
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.ts
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsinternal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/tests/ttl.test.tsapps/webapp/app/runEngine/concerns/idempotencyKeys.server.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTriggerAndWait()` to batch trigger tasks and wait for all results from a parent task
Applied to files:
internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.tsapps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.tsinternal-packages/run-engine/src/engine/index.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.trigger()` with type-only imports to trigger tasks from backend code without importing the task implementation
Applied to files:
internal-packages/run-engine/src/engine/tests/trigger.test.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp
Applied to files:
apps/webapp/app/env.server.tsapps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-08-14T18:35:44.370Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: apps/webapp/app/env.server.ts:764-765
Timestamp: 2025-08-14T18:35:44.370Z
Learning: The BoolEnv helper in apps/webapp/app/utils/boolEnv.ts uses z.preprocess with inconsistent default value types across the codebase - some usages pass boolean defaults (correct) while others pass string defaults (incorrect), leading to type confusion. The helper should enforce boolean-only defaults or have clearer documentation.
Applied to files:
apps/webapp/app/env.server.ts
📚 Learning: 2025-08-19T09:49:07.011Z
Learnt from: julienvanbeveren
Repo: triggerdotdev/trigger.dev PR: 2417
File: apps/webapp/app/routes/api.v1.projects.$projectRef.envvars.$slug.import.ts:56-61
Timestamp: 2025-08-19T09:49:07.011Z
Learning: In the Trigger.dev codebase, environment variables should default to `isSecret: false` when not explicitly marked as secrets in the syncEnvVars functionality. This is the intended behavior for both regular variables and parent variables.
Applied to files:
apps/webapp/app/env.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Scope idempotency keys globally or to current run using the scope parameter
Applied to files:
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeys.create()` to create idempotency keys for preventing duplicate task executions
Applied to files:
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Export tasks with unique IDs within the project to enable proper task discovery and execution
Applied to files:
apps/webapp/app/runEngine/services/triggerTask.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled/cron tasks instead of regular `task()`
Applied to files:
apps/webapp/app/runEngine/services/triggerTask.server.tsinternal-packages/run-engine/src/engine/systems/enqueueSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk/v3` with Zod schema for payload validation
Applied to files:
apps/webapp/app/runEngine/services/triggerTask.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Specify runtime environment (node or bun) in trigger.config.ts using the `runtime` property
Applied to files:
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts
📚 Learning: 2025-12-18T14:09:10.154Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2794
File: internal-packages/run-engine/src/engine/systems/debounceSystem.ts:390-397
Timestamp: 2025-12-18T14:09:10.154Z
Learning: In the debounce system (internal-packages/run-engine/src/engine/systems/debounceSystem.ts), millisecond delays are not supported. The minimum debounce delay is 1 second (1s). The parseNaturalLanguageDuration function supports w/d/hr/h/m/s units only.
Applied to files:
internal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Limit task duration using the `maxDuration` property (in seconds)
Applied to files:
internal-packages/run-engine/src/engine/systems/enqueueSystem.tsinternal-packages/run-engine/src/engine/systems/ttlSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use logger methods (debug, log, info, warn, error) from `trigger.dev/sdk/v3` for structured logging in tasks
Applied to files:
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets
Applied to files:
apps/webapp/app/runEngine/types.tsapps/webapp/app/runEngine/services/batchTrigger.server.ts
📚 Learning: 2026-02-03T18:27:40.429Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2994
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.environment-variables/route.tsx:553-555
Timestamp: 2026-02-03T18:27:40.429Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.environment-variables/route.tsx, the menu buttons (e.g., Edit with PencilSquareIcon) in the TableCellMenu are intentionally icon-only with no text labels as a compact UI pattern. This is a deliberate design choice for this route; preserve the icon-only behavior for consistency in this file.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
📚 Learning: 2025-12-08T15:19:56.823Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2760
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx:278-281
Timestamp: 2025-12-08T15:19:56.823Z
Learning: In apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx, the tableState search parameter uses intentional double-encoding: the parameter value contains a URL-encoded URLSearchParams string, so decodeURIComponent(value("tableState") ?? "") is required to fully decode it before parsing with new URLSearchParams(). This pattern allows bundling multiple filter/pagination params as a single search parameter.
Applied to files:
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine
Applied to files:
internal-packages/run-engine/src/engine/index.ts
🧬 Code graph analysis (15)
apps/webapp/app/v3/utils/queueLimits.server.ts (1)
apps/webapp/app/env.server.ts (1)
env(1332-1332)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
internal-packages/run-engine/src/engine/statuses.ts (1)
isFinalRunStatus(55-57)
apps/webapp/app/v3/runEngine.server.ts (2)
apps/webapp/app/env.server.ts (1)
env(1332-1332)apps/supervisor/src/env.ts (1)
env(144-144)
internal-packages/run-engine/src/run-queue/keyProducer.ts (1)
internal-packages/run-engine/src/run-queue/index.ts (4)
shard(1170-1236)shard(1258-1318)shard(1324-1350)shard(1420-1545)
apps/webapp/app/env.server.ts (2)
apps/webapp/app/utils/boolEnv.ts (1)
BoolEnv(12-14)apps/supervisor/src/envUtil.ts (1)
BoolEnv(15-17)
apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts (1)
packages/core/src/v3/isomorphic/friendlyId.ts (1)
RunId(93-93)
apps/webapp/app/runEngine/services/triggerTask.server.ts (3)
internal-packages/run-engine/src/engine/errors.ts (1)
ServiceValidationError(69-78)apps/webapp/app/v3/services/common.server.ts (1)
ServiceValidationError(1-6)apps/webapp/app/utils/packets.ts (1)
handleMetadataPacket(11-37)
apps/webapp/app/presenters/v3/LimitsPresenter.server.ts (2)
apps/webapp/app/v3/runEngine.server.ts (1)
engine(11-11)apps/webapp/app/v3/utils/queueLimits.server.ts (2)
getQueueSizeLimit(24-33)getQueueSizeLimitSource(42-51)
internal-packages/run-engine/src/batch-queue/index.ts (3)
apps/webapp/app/runEngine/services/batchTrigger.server.ts (1)
batchId(164-350)packages/core/src/v3/apiClient/index.ts (1)
batchId(419-537)internal-packages/run-engine/src/engine/systems/batchSystem.ts (1)
batchId(39-137)
internal-packages/run-engine/src/engine/systems/enqueueSystem.ts (1)
packages/core/src/v3/isomorphic/duration.ts (1)
parseNaturalLanguageDuration(76-79)
apps/webapp/app/runEngine/types.ts (2)
internal-packages/run-engine/src/run-queue/fairQueueSelectionStrategy.ts (4)
env(501-521)env(560-578)env(580-594)env(596-616)internal-packages/run-engine/src/shared/index.ts (1)
AuthenticatedEnvironment(4-6)
apps/webapp/app/runEngine/concerns/queues.server.ts (2)
apps/webapp/app/v3/models/workerDeployment.server.ts (1)
findCurrentWorkerFromEnvironment(198-224)apps/webapp/app/runEngine/types.ts (1)
QueueValidationResult(38-46)
internal-packages/run-engine/src/run-queue/index.ts (1)
internal-packages/redis/src/index.ts (2)
Callback(4-4)Result(4-4)
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.limits/route.tsx (1)
apps/webapp/app/utils/numberFormatter.ts (1)
formatNumber(15-22)
internal-packages/run-engine/src/engine/index.ts (2)
internal-packages/run-engine/src/run-queue/index.ts (2)
options(1152-1168)options(1240-1256)internal-packages/run-engine/src/shared/index.ts (1)
MinimalAuthenticatedEnvironment(8-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: typecheck / typecheck
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| /** | ||
| * Update the runCount in batch metadata. | ||
| * Used when items are skipped due to queue limits. | ||
| */ | ||
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | ||
| const meta = await this.getMeta(batchId); | ||
| if (!meta) { | ||
| this.logger.error("Cannot update runCount: batch metadata not found", { batchId }); | ||
| return; | ||
| } | ||
|
|
||
| const updatedMeta: BatchMeta = { | ||
| ...meta, | ||
| runCount: newRunCount, | ||
| }; | ||
|
|
||
| await this.storeMeta(batchId, updatedMeta); | ||
| this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount }); | ||
| } |
There was a problem hiding this comment.
Guard against invalid runCount values.
updateRunCount should reject negative/NaN/non‑integer values to avoid corrupting batch metadata.
🛡️ Suggested guard
async updateRunCount(batchId: string, newRunCount: number): Promise<void> {
+ if (!Number.isInteger(newRunCount) || newRunCount < 0) {
+ this.logger.error("Invalid runCount update", { batchId, newRunCount });
+ return;
+ }
const meta = await this.getMeta(batchId);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Update the runCount in batch metadata. | |
| * Used when items are skipped due to queue limits. | |
| */ | |
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | |
| const meta = await this.getMeta(batchId); | |
| if (!meta) { | |
| this.logger.error("Cannot update runCount: batch metadata not found", { batchId }); | |
| return; | |
| } | |
| const updatedMeta: BatchMeta = { | |
| ...meta, | |
| runCount: newRunCount, | |
| }; | |
| await this.storeMeta(batchId, updatedMeta); | |
| this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount }); | |
| } | |
| /** | |
| * Update the runCount in batch metadata. | |
| * Used when items are skipped due to queue limits. | |
| */ | |
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | |
| if (!Number.isInteger(newRunCount) || newRunCount < 0) { | |
| this.logger.error("Invalid runCount update", { batchId, newRunCount }); | |
| return; | |
| } | |
| const meta = await this.getMeta(batchId); | |
| if (!meta) { | |
| this.logger.error("Cannot update runCount: batch metadata not found", { batchId }); | |
| return; | |
| } | |
| const updatedMeta: BatchMeta = { | |
| ...meta, | |
| runCount: newRunCount, | |
| }; | |
| await this.storeMeta(batchId, updatedMeta); | |
| this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount }); | |
| } |
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts` around
lines 112 - 130, The updateRunCount method currently trusts newRunCount; add
input validation in updateRunCount (in completionTracker.ts) to reject negative,
NaN, non-finite or non-integer values before calling getMeta/storeMeta: use
Number.isInteger(newRunCount) && newRunCount >= 0 (and optionally
Number.isFinite) and if the check fails log an error via this.logger.error with
batchId and the invalid value and then throw an Error or return a rejected
Promise so the invalid value is never persisted; keep existing behavior for meta
retrieval (getMeta) and storage (storeMeta) otherwise.
| /** | ||
| * Update the runCount for a batch. | ||
| * Used when items are skipped due to queue limits. | ||
| */ | ||
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | ||
| return this.completionTracker.updateRunCount(batchId, newRunCount); | ||
| } |
There was a problem hiding this comment.
Batch can get stuck if runCount drops after processing completes.
If newRunCount is lowered to a value already reached by processedCount, no further items may run to trigger finalization, leaving the batch incomplete.
🛠️ Suggested fix
async updateRunCount(batchId: string, newRunCount: number): Promise<void> {
- return this.completionTracker.updateRunCount(batchId, newRunCount);
+ await this.completionTracker.updateRunCount(batchId, newRunCount);
+ const meta = await this.completionTracker.getMeta(batchId);
+ if (!meta) return;
+ const processedCount = await this.completionTracker.getProcessedCount(batchId);
+ if (processedCount >= meta.runCount) {
+ await this.#finalizeBatch(batchId, meta);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Update the runCount for a batch. | |
| * Used when items are skipped due to queue limits. | |
| */ | |
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | |
| return this.completionTracker.updateRunCount(batchId, newRunCount); | |
| } | |
| /** | |
| * Update the runCount for a batch. | |
| * Used when items are skipped due to queue limits. | |
| */ | |
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | |
| await this.completionTracker.updateRunCount(batchId, newRunCount); | |
| const meta = await this.completionTracker.getMeta(batchId); | |
| if (!meta) return; | |
| const processedCount = await this.completionTracker.getProcessedCount(batchId); | |
| if (processedCount >= meta.runCount) { | |
| await this.#finalizeBatch(batchId, meta); | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/batch-queue/index.ts` around lines 380 -
386, The updateRunCount method can lower runCount below the already-processed
count and leave a batch stuck; before calling
this.completionTracker.updateRunCount(batchId, newRunCount) clamp newRunCount to
at least the current processed count (e.g. newRunCount = Math.max(newRunCount,
this.completionTracker.getProcessedCount(batchId))) or, alternatively, after
updating call into the completion tracker to run the completion/finalization
check (e.g. this.completionTracker.checkCompletion/finalizeBatch for batchId) so
that if processedCount >= runCount the batch is finalized immediately; update
updateRunCount to perform one of these fixes using the existing
completionTracker APIs.
| import { containerTest, assertNonNullable } from "@internal/testcontainers"; | ||
| import { trace } from "@internal/tracing"; | ||
| import { expect } from "vitest"; | ||
| import { RunEngine } from "../index.js"; | ||
| import { setTimeout } from "node:timers/promises"; | ||
| import { setupAuthenticatedEnvironment, setupBackgroundWorker } from "./setup.js"; | ||
|
|
||
| vi.setConfig({ testTimeout: 60_000 }); |
There was a problem hiding this comment.
Missing vi import from vitest.
The vi global is used on line 8 but not imported. While vitest may auto-inject globals, explicit imports are more reliable.
🔧 Proposed fix
import { containerTest, assertNonNullable } from "@internal/testcontainers";
import { trace } from "@internal/tracing";
-import { expect } from "vitest";
+import { expect, vi } from "vitest";
import { RunEngine } from "../index.js";🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/engine/tests/lazyWaitpoint.test.ts` around
lines 1 - 8, The test uses the vitest runner's mock/time API via vi.setConfig
but never imports vi; update the imports to explicitly import vi from 'vitest'
(for example change the existing import of expect to import { expect, vi } from
"vitest") so the vi symbol used by vi.setConfig is defined and the test runner
APIs are available.
| // TTL System Methods | ||
|
|
||
| #startTtlConsumers() { | ||
| if (!this.options.ttlSystem) { | ||
| this.logger.debug("TTL system disabled (no ttlSystem config)"); | ||
| return; | ||
| } | ||
|
|
||
| const shardCount = this.options.ttlSystem.shardCount ?? this.shardCount; | ||
|
|
||
| for (let i = 0; i < shardCount; i++) { | ||
| this.logger.debug(`Starting TTL consumer ${i}`); | ||
| this.#startTtlConsumer(i).catch((err) => { | ||
| this.logger.error(`Failed to start TTL consumer ${i}`, { error: err }); | ||
| }); | ||
| } | ||
|
|
||
| this.logger.debug(`Started ${shardCount} TTL consumers`); | ||
| } | ||
|
|
||
| async #startTtlConsumer(shard: number) { | ||
| if (!this.options.ttlSystem) { | ||
| return; | ||
| } | ||
|
|
||
| const pollIntervalMs = this.options.ttlSystem.pollIntervalMs ?? 1000; | ||
| const batchSize = this.options.ttlSystem.batchSize ?? 100; | ||
| let processedCount = 0; | ||
|
|
||
| try { | ||
| for await (const _ of setInterval(pollIntervalMs, null, { | ||
| signal: this.abortController.signal, | ||
| })) { | ||
| const now = Date.now(); | ||
|
|
||
| const [error, expiredRuns] = await tryCatch( | ||
| this.#expireTtlRuns(shard, now, batchSize) | ||
| ); | ||
|
|
||
| if (error) { | ||
| this.logger.error(`Failed to expire TTL runs for shard ${shard}`, { | ||
| error, | ||
| service: this.name, | ||
| shard, | ||
| }); | ||
| continue; | ||
| } | ||
|
|
||
| if (expiredRuns.length > 0) { | ||
| this.logger.debug(`Expired ${expiredRuns.length} TTL runs in shard ${shard}`, { | ||
| service: this.name, | ||
| shard, | ||
| count: expiredRuns.length, | ||
| }); | ||
|
|
||
| // Call the callback with expired runs | ||
| try { | ||
| await this.options.ttlSystem!.callback(expiredRuns); | ||
| processedCount += expiredRuns.length; | ||
| } catch (callbackError) { | ||
| this.logger.error(`TTL callback failed for shard ${shard}`, { | ||
| error: callbackError, | ||
| service: this.name, | ||
| shard, | ||
| runCount: expiredRuns.length, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Callback failures can silently drop expired runs.
#expireTtlRuns() removes runs before invoking the callback, and failures are only logged. If the callback is responsible for persisting expiration state, those runs may be lost without retry. Consider a durable retry path (e.g., a Redis list/sorted set of failed expirations, or a dedicated worker with backoff) or make the callback itself handle retry + idempotency explicitly.
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/run-queue/index.ts` around lines 1238 -
1304, The code currently calls `#expireTtlRuns`(shard, now, batchSize) which
deletes/marks runs as expired before invoking
options.ttlSystem!.callback(expiredRuns), so callback failures only get logged
and those runs can be lost; update `#startTtlConsumer` (or `#expireTtlRuns`) so that
expiration is durable: either (A) defer permanent deletion/marking of runs until
after the callback completes successfully (move the delete/mark logic to after
options.ttlSystem!.callback returns) or (B) if deletion must happen first, add a
durable retry/enqueue path when the callback throws — e.g., push failed expired
run IDs to a retry store (Redis list/sorted set) with metadata and backoff, and
spawn a retry worker to reprocess them; ensure the callback and retry path are
idempotent and update logging to include run IDs and shard when queuing retries.
…env queue size check
…ndles failures from queue length limit failures and also retries
aaea8d6 to
21dae6f
Compare
… var, set at engine level
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal-packages/run-engine/src/batch-queue/index.ts (1)
958-976:⚠️ Potential issue | 🟠 MajorCompletion check at Line 969 uses
meta.runCountwhich may be stale.
metais fetched once per message at Line 762-764. IfupdateRunCountwas called between the meta fetch and the completion check, the comparisonprocessedCount === meta.runCountcould use an outdated value, potentially missing or double-triggering finalization. For correctness, consider re-fetchingrunCountatomically alongside theprocessedCountincrement, or havingupdateRunCounttrigger a finalization check.
🤖 Fix all issues with AI agents
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx:
- Around line 367-370: The call to getQueueUsageColorClass(environment.queued,
environment.queueSizeLimit) is unresolved — add a definition or import for
getQueueUsageColorClass that mirrors the logic of the existing
getUsageColorClass used in the limits route: implement a function
getQueueUsageColorClass(queued: number | undefined, queueSizeLimit: number |
undefined) that returns a CSS class string (or null) based on usage percentage
thresholds and handles undefined values; place it in the same file (or import
it) so valueClassName can call it, and ensure it returns null when no class
should be applied so the existing env.paused fallback ("text-warning
tabular-nums" or "tabular-nums") continues to work.
- Around line 515-518: The per-queue limit check (isAtQueueLimit) is using
queue.queued against environment.queueSizeLimit which is an environment-wide
cap; change the comparison to use environment.queued (total queued across all
queues) versus environment.queueSizeLimit so the badge reflects environment
capacity (or implement a true per-queue limit if intended). Update the
isAtQueueLimit logic to reference environment.queued and
environment.queueSizeLimit (consistent with the BigNumber usage) and remove or
revise any reliance on queue.queued for this environment-wide badge.
In `@apps/webapp/app/v3/runEngineHandlers.server.ts`:
- Around line 722-754: The duplicated pre-failed run creation via
triggerFailedTaskService.call when isFinalAttempt can itself throw and cause a
second attempt in the outer catch — wrap each inner invocation of
triggerFailedTaskService.call (the calls inside the isFinalAttempt branch and
the analogous call in the outer catch) in a small try/catch so any errors from
that call are caught and handled locally (log the error to the existing
span/process logger and avoid rethrowing), ensure span.end() still runs, and
return the same successful pre-failed-run response when a failedRunId is
returned; this prevents a partial success from causing a duplicate pre-failed
run while preserving the current control flow for both the isFinalAttempt path
and the outer catch path.
In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts`:
- Around line 116-130: The updateRunCount function performs a non-atomic
read-modify-write (getMeta → modify → storeMeta) which can lose updates under
concurrent callers; change it to perform the increment atomically in the backing
store (e.g., use a Redis Lua script or the same atomic pattern used by
recordSuccessIdempotent) so runCount is updated with a single atomic operation,
or enforce/document single-writer semantics; specifically replace the current
getMeta/storeMeta sequence in updateRunCount with an atomic increment/update
that targets the BatchMeta.runCount field (or wrap the logic in a Lua script
that reads, increments/sets and returns the new value) and update logging to
reflect the atomic result.
In `@internal-packages/run-engine/src/engine/index.ts`:
- Around line 2275-2300: The method `#resolveMaxTtl` currently returns the
original unparseable ttl string when parseNaturalLanguageDurationInMs(ttl)
yields undefined, which allows invalid TTLs to bypass the max clamp; change the
logic so that if ttl fails to parse but the configured options.defaultMaxTtl
parses successfully, you return the parsed max (i.e., maxTtl) to enforce the
cap; keep the existing fallback that if defaultMaxTtl itself is unparseable
(parseNaturalLanguageDurationInMs(maxTtl) === undefined) you return ttl as
before.
- Around line 2245-2273: The `#ttlExpiredCallback` currently calls
this.ttlSystem.expireRunsBatch after a Lua script has already removed runs from
Redis, so if expireRunsBatch throws the DB update fails the runs become
orphaned; to fix, add compensation in `#ttlExpiredCallback` to detect errors from
ttlSystem.expireRunsBatch and, on failure, re-add the affected runIds back into
the TTL structures (sorted set/queue/concurrency tracking) via a new ttlSystem
method (e.g., ttlSystem.requeueRuns or ttlSystem.restoreRuns) or invoke existing
requeue logic, and/or schedule the runIds for a background cleanup retry; ensure
the added code references `#ttlExpiredCallback`, ttlSystem.expireRunsBatch, and
the Lua removal behavior so the requeue/cleanup is called only when
expireRunsBatch fails.
In `@internal-packages/run-engine/src/engine/systems/ttlSystem.ts`:
- Around line 248-291: The batch update (taskRun.updateMany) followed by per-run
updates (taskRun.update and taskRunExecutionSnapshot.create) can leave runs
marked EXPIRED without snapshots/errors if any per-run op fails; change
ttlExpiredCallback to perform each run's status change, error write, and
snapshot creation inside a single per-run prisma.$transaction() (instead of the
global updateMany + Promise.all) so that setting status to EXPIRED,
completedAt/expiredAt, writing the error, and creating taskRunExecutionSnapshot
happen atomically for each run (operate on runsToExpire entries and replace uses
of taskRun.updateMany, taskRun.update, and taskRunExecutionSnapshot.create
accordingly), and ensure transaction failures are surfaced/logged so those runs
remain PENDING for the next TTL poll.
In `@internal-packages/run-engine/src/run-queue/index.ts`:
- Around line 2710-2748: The dequeue Lua branch that handles TTL expiration
(checks using ttlExpiresAt <= currentTime) currently removes the message from
queueKey, envQueueKey, messageKey and ttlQueueKey but never reports the expired
run to the application; to fix this, stop evicting TTL-expired messages in the
dequeue path: modify the ttlExpiresAt <= currentTime branch so it does not
remove the message or its TTL membership (i.e., do not call ZREM on
queueKey/envQueueKey or DEL messageKey or ZREM ttlQueueKey) and simply skip
adding the run to results (leaving the message for the TTL consumer to handle),
ensuring that results, dequeuedCount and SADD calls are untouched for the
non-expired branch; locate and change the code around the symbols ttlExpiresAt,
currentTime, queueKey, envQueueKey, ttlQueueKey, messageKey, results and
dequeuedCount.
🧹 Nitpick comments (10)
apps/webapp/app/env.server.ts (1)
536-540:QUEUE_SIZE_CACHE_ENABLEDuses numeric type instead ofBoolEnv.Other boolean-like env vars in this file (e.g.,
RUN_ENGINE_TTL_SYSTEM_DISABLED,BATCH_TRIGGER_CACHED_RUNS_CHECK_ENABLED) useBoolEnv. Usingz.coerce.number().int()for an enable/disable flag is inconsistent and forces consumers to compare against1instead of using a proper boolean.Suggested change
- QUEUE_SIZE_CACHE_ENABLED: z.coerce.number().int().optional().default(1), + QUEUE_SIZE_CACHE_ENABLED: BoolEnv.default(true),apps/webapp/app/runEngine/concerns/idempotencyKeys.server.ts (1)
120-122: Non-null assertion onassociatedWaitpoint!is safe but could be cleaner.The
!is safe because of the early return on line 98-100, but TypeScript can't narrowletvariables inside closures. Consider assigning to aconstafter the guard to eliminate the assertion.Suggested approach
if (!associatedWaitpoint) { return { isCached: true, run: existingRun }; } + const waitpoint = associatedWaitpoint; await this.traceEventConcern.traceIdempotentRun( ... await this.engine.blockRunWithWaitpoint({ runId: RunId.fromFriendlyId(parentRunId), - waitpoints: associatedWaitpoint!.id, + waitpoints: waitpoint.id,internal-packages/run-engine/src/engine/tests/getSnapshotsSince.test.ts (1)
214-218: Relaxed assertion removes verification of waitpoint completion in snapshots.The original assertion that
latest.completedWaitpoints.length > 0was removed and replaced with comments. While timing sensitivity is real, this means the test no longer verifies the core "waitpoints only on latest" behavior — it only checks that earlier snapshots have empty waitpoints.Consider adding a retry/poll loop (e.g., poll
getSnapshotsSinceuntil the latest snapshot hascompletedWaitpoints.length > 0or a timeout), so the test still validates the positive case without being flaky.internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)
17-17: Derive/validate env + project from the run to prevent mismatches.
Nice use ofisFinalRunStatusto avoid creating waitpoints for terminal runs. Since you already load the run, consider using/validating itsprojectIdandruntimeEnvironmentIdinstead of trusting caller inputs, to avoid accidental cross‑environment linkage.🔧 Suggested adjustment
- const run = await this.$.prisma.taskRun.findFirst({ - where: { id: runId }, - include: { associatedWaitpoint: true }, - }); + const run = await this.$.prisma.taskRun.findFirst({ + where: { id: runId }, + select: { + status: true, + projectId: true, + runtimeEnvironmentId: true, + associatedWaitpoint: true, + }, + }); ... - const runAfterLock = await this.$.prisma.taskRun.findFirst({ - where: { id: runId }, - include: { associatedWaitpoint: true }, - }); + const runAfterLock = await this.$.prisma.taskRun.findFirst({ + where: { id: runId }, + select: { + status: true, + projectId: true, + runtimeEnvironmentId: true, + associatedWaitpoint: true, + }, + }); + if ( + runAfterLock.projectId !== projectId || + runAfterLock.runtimeEnvironmentId !== environmentId + ) { + throw new Error( + "getOrCreateRunWaitpoint: run does not match provided project/environment" + ); + } - const waitpointData = this.buildRunAssociatedWaitpoint({ projectId, environmentId }); + const waitpointData = this.buildRunAssociatedWaitpoint({ + projectId: runAfterLock.projectId, + environmentId: runAfterLock.runtimeEnvironmentId, + });Also applies to: 819-875
apps/webapp/app/runEngine/types.ts (1)
59-77: Prefer a type alias forQueueManager(repo guideline).
Since this interface is being updated, consider converting it to a type alias to match the TS style guide.As per coding guidelines "Use types over interfaces for TypeScript".♻️ Suggested refactor
-export interface QueueManager { +export type QueueManager = { resolveQueueProperties( request: TriggerTaskRequest, lockedBackgroundWorker?: LockedBackgroundWorker ): Promise<QueueProperties>; getQueueName(request: TriggerTaskRequest): Promise<string>; validateQueueLimits( env: AuthenticatedEnvironment, queueName: string, itemsToAdd?: number ): Promise<QueueValidationResult>; resolveQueueNamesForBatchItems( env: AuthenticatedEnvironment, items: Array<{ task: string; options?: { queue?: { name?: string } } }> ): Promise<Map<string, number>>; validateMultipleQueueLimits( env: AuthenticatedEnvironment, queueCounts: Map<string, number> ): Promise<{ ok: true } | { ok: false; queueName: string; maximumSize: number; queueSize: number }>; getWorkerQueue( env: AuthenticatedEnvironment, regionOverride?: string ): Promise<string | undefined>; -} +};apps/webapp/app/v3/runEngineHandlers.server.ts (1)
651-670: TriggerFailedTaskService is instantiated on every batch item invocation.The service is only used on final-attempt failures, but it's constructed eagerly at the top of every callback invocation. Consider moving the instantiation inside the
isFinalAttemptguards to avoid unnecessary allocations on the happy path.apps/webapp/app/runEngine/concerns/queues.server.ts (1)
421-436:getCachedQueueSizefails open on cache errors — verify this is intentional.Line 435 returns
result.val ?? 0, meaning if the cache fetch fails (returningundefined), the queue size is treated as 0. This allows requests through even when queue size can't be determined, which is a permissive-failure strategy. This is likely the right trade-off (availability over strict enforcement), but worth documenting.internal-packages/run-engine/src/run-queue/index.ts (3)
108-110: Usetypeinstead ofinterfaceforTtlSystemCallback.Per coding guidelines, prefer types over interfaces in TypeScript files.
Proposed fix
-export interface TtlSystemCallback { - (runs: Array<{ queueKey: string; runId: string; orgId: string }>): Promise<void>; -} +export type TtlSystemCallback = ( + runs: Array<{ queueKey: string; runId: string; orgId: string }> +) => Promise<void>;As per coding guidelines:
**/*.{ts,tsx}: Use types over interfaces for TypeScript.
669-677: TTL member uses|as delimiter — consider a safer encoding.The TTL member format
${queueKey}|${runId}|${orgId}relies on|never appearing in any of these values. While current ID formats make this safe today, a future change to key formats could silently break the parsing in theexpireTtlRunsLua script (Lines 2582–2588). A null byte (\0) or a fixed-width encoding would be more robust.
1713-1721: Minor: TTL shard is computed even when TTL system is disabled.The TTL shard derivation (Lines 1714–1718) runs unconditionally, though the result is only meaningful when
this.options.ttlSystemis set. This is functionally harmless since the empty-stringttlQueueKeydisables TTL logic in the Lua script, but wrapping it in the conditional would be slightly cleaner.
| valueClassName={ | ||
| getQueueUsageColorClass(environment.queued, environment.queueSizeLimit) ?? | ||
| (env.paused ? "text-warning tabular-nums" : "tabular-nums") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for where getQueueUsageColorClass is defined or referenced
rg "getQueueUsageColorClass" --type=ts --type=tsx -nRepository: triggerdotdev/trigger.dev
Length of output: 96
🏁 Script executed:
#!/bin/bash
# Search for getQueueUsageColorClass without file type filter
rg "getQueueUsageColorClass" -nRepository: triggerdotdev/trigger.dev
Length of output: 268
🏁 Script executed:
#!/bin/bash
# Search for similar function names like getUsageColorClass
rg "getUsageColorClass" -nRepository: triggerdotdev/trigger.dev
Length of output: 574
🏁 Script executed:
#!/bin/bash
# Check the imports in the queues route file
head -50 "apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx"Repository: triggerdotdev/trigger.dev
Length of output: 193
🏁 Script executed:
#!/bin/bash
# Find the queues route file
find . -name "route.tsx" -path "*queues*" | head -5Repository: triggerdotdev/trigger.dev
Length of output: 175
🏁 Script executed:
#!/bin/bash
# Read the first 100 lines of the queues route to check imports
head -100 "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx" | cat -nRepository: triggerdotdev/trigger.dev
Length of output: 5079
🏁 Script executed:
#!/bin/bash
# Search around line 368 to see the context and also check the end of the file for function definitions
sed -n '360,380p' "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx" | cat -nRepository: triggerdotdev/trigger.dev
Length of output: 1025
🏁 Script executed:
#!/bin/bash
# Check the end of the file to see if the function is defined locally
tail -100 "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx" | cat -nRepository: triggerdotdev/trigger.dev
Length of output: 3402
🏁 Script executed:
#!/bin/bash
# Get the total line count of the file
wc -l "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx"Repository: triggerdotdev/trigger.dev
Length of output: 180
🏁 Script executed:
#!/bin/bash
# Search for getQueueUsageColorClass definition in the entire file
grep -n "function getQueueUsageColorClass\|const getQueueUsageColorClass" "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx"Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
#!/bin/bash
# Search for getQueueUsageColorClass in all TypeScript/JavaScript files
rg "getQueueUsageColorClass" -A 2 -B 2Repository: triggerdotdev/trigger.dev
Length of output: 823
🏁 Script executed:
#!/bin/bash
# Check the limits route to see the getUsageColorClass function for reference
sed -n '799,850p' "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.limits/route.tsx" | cat -nRepository: triggerdotdev/trigger.dev
Length of output: 1699
🏁 Script executed:
#!/bin/bash
# Look for any color class functions in the queues route file
grep -n "function.*Color\|const.*Color" "./apps/webapp/app/routes/_app.orgs.\$organizationSlug.projects.\$projectParam.env.\$envParam.queues/route.tsx"Repository: triggerdotdev/trigger.dev
Length of output: 51
getQueueUsageColorClass function is missing and will cause a runtime error.
Line 368 calls getQueueUsageColorClass(environment.queued, environment.queueSizeLimit) but this function is neither defined in this file nor imported from anywhere. The codebase has no definition of this function.
Define this function locally in this file or import it from the appropriate module. Reference the similar getUsageColorClass function in the limits route for implementation guidance.
🤖 Prompt for AI Agents
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
around lines 367 - 370, The call to getQueueUsageColorClass(environment.queued,
environment.queueSizeLimit) is unresolved — add a definition or import for
getQueueUsageColorClass that mirrors the logic of the existing
getUsageColorClass used in the limits route: implement a function
getQueueUsageColorClass(queued: number | undefined, queueSizeLimit: number |
undefined) that returns a CSS class string (or null) based on usage percentage
thresholds and handles undefined values; place it in the same file (or import
it) so valueClassName can call it, and ensure it returns null when no class
should be applied so the existing env.paused fallback ("text-warning
tabular-nums" or "tabular-nums") continues to work.
| const isAtConcurrencyLimit = queue.running >= limit; | ||
| const isAtQueueLimit = | ||
| environment.queueSizeLimit !== null && | ||
| queue.queued >= environment.queueSizeLimit; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "queueSizeLimit" --type=ts --type=tsx -C 3Repository: triggerdotdev/trigger.dev
Length of output: 96
🏁 Script executed:
rg -n "queueSizeLimit" -C 3Repository: triggerdotdev/trigger.dev
Length of output: 6310
🏁 Script executed:
rg -n "function getQueueSizeLimit|const getQueueSizeLimit|export.*getQueueSizeLimit" -C 5Repository: triggerdotdev/trigger.dev
Length of output: 2121
🏁 Script executed:
cat -n apps/webapp/app/v3/utils/queueLimits.server.tsRepository: triggerdotdev/trigger.dev
Length of output: 2183
🏁 Script executed:
rg -n "guardQueueSizeLimitsForEnv" -C 10Repository: triggerdotdev/trigger.dev
Length of output: 10225
🏁 Script executed:
cat -n apps/webapp/app/v3/queueSizeLimits.server.tsRepository: triggerdotdev/trigger.dev
Length of output: 1577
🏁 Script executed:
sed -n '340,375p' apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsxRepository: triggerdotdev/trigger.dev
Length of output: 177
🏁 Script executed:
find apps/webapp/app/routes -name "route.tsx" | grep queuesRepository: triggerdotdev/trigger.dev
Length of output: 173
🏁 Script executed:
sed -n '340,375p' 'apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx'Repository: triggerdotdev/trigger.dev
Length of output: 1559
🏁 Script executed:
sed -n '510,525p' 'apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx'Repository: triggerdotdev/trigger.dev
Length of output: 901
The per-queue queue limit badge uses incorrect semantics.
isAtQueueLimit at lines 517–518 compares queue.queued (a single queue's count) against environment.queueSizeLimit, which is an environment-wide limit. The queueSizeLimit is calculated once per environment and represents the total allowed items across all queues, not per-queue capacity.
This comparison will almost never trigger correctly. For example, with a 1000-item environment limit and 10 queues, a single queue would need 1000 items to trigger the badge, even though the environment might already be at capacity with items spread across queues.
Use environment.queued (total across all queues) instead of queue.queued for consistency with the BigNumber component at line 368, or implement a per-queue limit if that's the intended behavior.
🤖 Prompt for AI Agents
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.queues/route.tsx
around lines 515 - 518, The per-queue limit check (isAtQueueLimit) is using
queue.queued against environment.queueSizeLimit which is an environment-wide
cap; change the comparison to use environment.queued (total queued across all
queues) versus environment.queueSizeLimit so the badge reflects environment
capacity (or implement a true per-queue limit if intended). Update the
isAtQueueLimit logic to reference environment.queued and
environment.queueSizeLimit (consistent with the BigNumber usage) and remove or
revise any reliance on queue.queued for this environment-wide badge.
| span.setAttribute("batch.result.error", "TriggerTaskService returned undefined"); | ||
| span.end(); | ||
|
|
||
| // Only create a pre-failed run on the final attempt; otherwise let the retry mechanism handle it | ||
| if (isFinalAttempt) { | ||
| const failedRunId = await triggerFailedTaskService.call({ | ||
| taskId: item.task, | ||
| environment, | ||
| payload: item.payload, | ||
| payloadType: item.payloadType as string, | ||
| errorMessage: "TriggerTaskService returned undefined", | ||
| parentRunId: meta.parentRunId, | ||
| resumeParentOnCompletion: meta.resumeParentOnCompletion, | ||
| batch: { id: batchId, index: itemIndex }, | ||
| options: item.options as Record<string, unknown>, | ||
| traceContext: meta.traceContext as Record<string, unknown> | undefined, | ||
| spanParentAsLink: meta.spanParentAsLink, | ||
| errorCode: TaskRunErrorCodes.BATCH_ITEM_COULD_NOT_TRIGGER, | ||
| }); | ||
|
|
||
| span.end(); | ||
|
|
||
| if (failedRunId) { | ||
| return { success: true as const, runId: failedRunId }; | ||
| } | ||
| } else { | ||
| span.end(); | ||
| } | ||
|
|
||
| return { | ||
| success: false as const, | ||
| error: "TriggerTaskService returned undefined", | ||
| errorCode: "TRIGGER_FAILED", | ||
| }; |
There was a problem hiding this comment.
Pre-failed run creation on final attempt — duplicated error-handling blocks.
The two paths (undefined result at Line 725 and caught exception at Line 762) share nearly identical triggerFailedTaskService.call(...) invocations with the same parameters. This is acceptable given the different control-flow contexts but worth noting for future extraction if the parameter list grows.
One thing to verify: if triggerFailedTaskService.call itself throws, the exception in the first block (Line 726) would bubble into the outer catch (Line 756), which would then attempt to call triggerFailedTaskService.call again on the same item. This could create a duplicate pre-failed run if the first call partially succeeded (e.g., DB row created but an error thrown after).
Proposed fix: wrap the inner triggerFailedTaskService.call in try/catch
if (isFinalAttempt) {
+ try {
const failedRunId = await triggerFailedTaskService.call({
// ...params...
});
span.end();
if (failedRunId) {
return { success: true as const, runId: failedRunId };
}
+ } catch (failedTaskError) {
+ logger.error("Failed to create pre-failed run", {
+ batchId,
+ itemIndex,
+ error: failedTaskError instanceof Error ? failedTaskError.message : String(failedTaskError),
+ });
+ span.end();
+ }
} else {
span.end();
}Also applies to: 756-791
🤖 Prompt for AI Agents
In `@apps/webapp/app/v3/runEngineHandlers.server.ts` around lines 722 - 754, The
duplicated pre-failed run creation via triggerFailedTaskService.call when
isFinalAttempt can itself throw and cause a second attempt in the outer catch —
wrap each inner invocation of triggerFailedTaskService.call (the calls inside
the isFinalAttempt branch and the analogous call in the outer catch) in a small
try/catch so any errors from that call are caught and handled locally (log the
error to the existing span/process logger and avoid rethrowing), ensure
span.end() still runs, and return the same successful pre-failed-run response
when a failedRunId is returned; this prevents a partial success from causing a
duplicate pre-failed run while preserving the current control flow for both the
isFinalAttempt path and the outer catch path.
| async updateRunCount(batchId: string, newRunCount: number): Promise<void> { | ||
| const meta = await this.getMeta(batchId); | ||
| if (!meta) { | ||
| this.logger.error("Cannot update runCount: batch metadata not found", { batchId }); | ||
| return; | ||
| } | ||
|
|
||
| const updatedMeta: BatchMeta = { | ||
| ...meta, | ||
| runCount: newRunCount, | ||
| }; | ||
|
|
||
| await this.storeMeta(batchId, updatedMeta); | ||
| this.logger.debug("Updated batch runCount", { batchId, oldRunCount: meta.runCount, newRunCount }); | ||
| } |
There was a problem hiding this comment.
Non-atomic read-modify-write on batch metadata.
updateRunCount performs getMeta → modify → storeMeta without any locking or atomic operation. If called concurrently for the same batch (e.g., two queue-limit rejections racing), one update could be lost, leaving runCount incorrect and potentially causing the batch to never complete.
Consider using a Lua script (similar to recordSuccessIdempotent) to atomically update the runCount field, or document that this method must only be called from a single-writer context.
🔒 Sketch of atomic Lua approach
+ // Register in `#registerCommands`():
+ this.redis.defineCommand("updateRunCountAtomic", {
+ numberOfKeys: 1,
+ lua: `
+ local metaKey = KEYS[1]
+ local newRunCount = tonumber(ARGV[1])
+ local metaJson = redis.call('GET', metaKey)
+ if not metaJson then return nil end
+ local meta = cjson.decode(metaJson)
+ meta.runCount = newRunCount
+ redis.call('SET', metaKey, cjson.encode(meta))
+ return 'OK'
+ `,
+ });
async updateRunCount(batchId: string, newRunCount: number): Promise<void> {
- const meta = await this.getMeta(batchId);
- if (!meta) {
- this.logger.error("Cannot update runCount: batch metadata not found", { batchId });
- return;
- }
- const updatedMeta: BatchMeta = { ...meta, runCount: newRunCount };
- await this.storeMeta(batchId, updatedMeta);
+ const result = await this.redis.updateRunCountAtomic(
+ this.metaKey(batchId),
+ newRunCount.toString()
+ );
+ if (!result) {
+ this.logger.error("Cannot update runCount: batch metadata not found", { batchId });
+ return;
+ }
this.logger.debug("Updated batch runCount", { batchId, newRunCount });
}🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/batch-queue/completionTracker.ts` around
lines 116 - 130, The updateRunCount function performs a non-atomic
read-modify-write (getMeta → modify → storeMeta) which can lose updates under
concurrent callers; change it to perform the increment atomically in the backing
store (e.g., use a Redis Lua script or the same atomic pattern used by
recordSuccessIdempotent) so runCount is updated with a single atomic operation,
or enforce/document single-writer semantics; specifically replace the current
getMeta/storeMeta sequence in updateRunCount with an atomic increment/update
that targets the BatchMeta.runCount field (or wrap the logic in a Lua script
that reads, increments/sets and returns the new value) and update logging to
reflect the atomic result.
| async #ttlExpiredCallback( | ||
| runs: Array<{ queueKey: string; runId: string; orgId: string }> | ||
| ): Promise<void> { | ||
| if (runs.length === 0) return; | ||
|
|
||
| try { | ||
| const runIds = runs.map((r) => r.runId); | ||
| const result = await this.ttlSystem.expireRunsBatch(runIds); | ||
|
|
||
| if (result.expired.length > 0) { | ||
| this.logger.debug("TTL system expired runs", { | ||
| expiredCount: result.expired.length, | ||
| expiredRunIds: result.expired, | ||
| }); | ||
| } | ||
|
|
||
| if (result.skipped.length > 0) { | ||
| this.logger.debug("TTL system skipped runs", { | ||
| skippedCount: result.skipped.length, | ||
| skipped: result.skipped, | ||
| }); | ||
| } | ||
| } catch (error) { | ||
| this.logger.error("Failed to expire runs via TTL system", { | ||
| runIds: runs.map((r) => r.runId), | ||
| error, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the overall TTL system structure and where `#ttlExpiredCallback` is used
rg -n "ttlExpiredCallback|ttlSystem|expireRunsBatch" --type=ts -g '!node_modules' internal-packages/run-engine/src/ -B 2 -A 2Repository: triggerdotdev/trigger.dev
Length of output: 19275
🏁 Script executed:
# Find the TtlSystem class implementation
fd -e ts -g '!node_modules' internal-packages/run-engine/src/ | xargs grep -l "class TtlSystem\|export.*TtlSystem"Repository: triggerdotdev/trigger.dev
Length of output: 53
🏁 Script executed:
# Check the entire TtlSystem class and its polling mechanism
rg -n "class TtlSystem" --type=ts -g '!node_modules' -A 100 internal-packages/run-engine/src/Repository: triggerdotdev/trigger.dev
Length of output: 9426
🏁 Script executed:
# Find the Lua script for TTL queue operations
fd -e lua --type f internal-packages/run-engine/src/ | head -20Repository: triggerdotdev/trigger.dev
Length of output: 506
🏁 Script executed:
# Search for the Lua script or script loading for TTL
rg -n "ttl.*lua\|ZRANGE.*ttl\|expired.*lua" --type=ts -g '!node_modules' internal-packages/run-engine/src/ -B 2 -A 2Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check the full error handling in the TTL consumer, particularly what happens after the catch block
sed -n '1290,1320p' internal-packages/run-engine/src/run-queue/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 865
🏁 Script executed:
# Search for Lua script files in the run-engine
find internal-packages/run-engine/src -type f -name "*.lua"Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Search for script content embedded in TypeScript files related to TTL
rg -n "ZRANGE|ZREVRANGE|ZADD|ZREM" --type=ts internal-packages/run-engine/src/run-queue/ -B 5 -A 5 | head -100Repository: triggerdotdev/trigger.dev
Length of output: 8490
🏁 Script executed:
# Look at the `#getPendingTtlRuns` method more carefully
sed -n '1320,1365p' internal-packages/run-engine/src/run-queue/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 1664
🏁 Script executed:
# Let me see the complete Lua script for expireTtlRuns to understand the full flow
sed -n '2560,2630p' internal-packages/run-engine/src/run-queue/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 2932
🏁 Script executed:
# Check if there's any mechanism to restore runs if callback fails
rg -n "expireTtlRuns\|#expireTtlRuns" --type=ts -A 30 internal-packages/run-engine/src/run-queue/index.ts | head -80Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Get the rest of the Lua script after line 2630
sed -n '2620,2660p' internal-packages/run-engine/src/run-queue/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 1563
🏁 Script executed:
# Let me also look at the test to see what happens when callback fails
rg -n "callback.*fail\|error.*ttl\|catch.*ttl" --type=ts internal-packages/run-engine/src/engine/tests/ttl.test.ts -B 2 -A 5Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Let me understand the callback flow better - what does expireRunsBatch do in case of failure?
sed -n '2245,2273p' internal-packages/run-engine/src/engine/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 911
🏁 Script executed:
# Check if there's any other mechanism that reschedules or retries failed TTL expirations
rg -n "ttlSystem.*retry\|ttlSystem.*reschedule\|ttlSystem.*recover" --type=ts -g '!node_modules' internal-packages/run-engine/src/Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Let me check if the runs that fail to expire in the database would somehow get picked up again
rg -n "PENDING.*status\|status.*PENDING" --type=ts internal-packages/run-engine/src/engine/systems/ttlSystem.ts -B 2 -A 2Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check if failed runs could be re-added to the TTL queue by any mechanism
rg -n "scheduleExpireRun\|ZADD.*ttl" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/ -B 2 -A 5 | head -60Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Look at the full flow of how runs get added to TTL queue initially
sed -n '735,750p' internal-packages/run-engine/src/engine/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 509
🏁 Script executed:
# Check if there's any background job or cron that might re-discover stuck PENDING runs
rg -n "PENDING\|pendingRuns\|stuckRuns\|orphaned" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/ | grep -i "pending.*run\|discover\|retry\|recovery" | head -20Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Verify the exact removal point in the Lua script - is it before callback or built into callback success?
sed -n '2560,2650p' internal-packages/run-engine/src/run-queue/index.ts | grep -A 50 "ZREM.*ttl"Repository: triggerdotdev/trigger.dev
Length of output: 2062
🏁 Script executed:
# Critical finding: The Lua script removes runs BEFORE callback. Let me verify if there's any cleanup mechanism
rg -n "cleanup\|recover\|orphan\|stuck" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/ | head -30Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check if there's a mechanism to re-add runs that fail database updates
rg -n "scheduleExpireRun" --type=ts internal-packages/run-engine/src/engine/systems/ttlSystem.ts -A 20Repository: triggerdotdev/trigger.dev
Length of output: 893
🏁 Script executed:
# The critical insight: runs are ZREM'd from Redis in the Lua script BEFORE callback
# Let me verify if there's any recovery: can orphaned PENDING runs be re-discovered?
rg -n "PENDING.*timeout\|TTL.*expired\|orphan" --type=ts -g '!node_modules' internal-packages/run-engine/src/engine/Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check what happens when expireRunsBatch partially succeeds/fails
sed -n '155,220p' internal-packages/run-engine/src/engine/systems/ttlSystem.tsRepository: triggerdotdev/trigger.dev
Length of output: 2029
🏁 Script executed:
# Continue looking at expireRunsBatch to see error handling
sed -n '220,280p' internal-packages/run-engine/src/engine/systems/ttlSystem.tsRepository: triggerdotdev/trigger.dev
Length of output: 2115
🏁 Script executed:
# Check if there's any compensation logic or rollback if expireRunsBatch fails
rg -n "catch\|throw\|error\|rollback" internal-packages/run-engine/src/engine/systems/ttlSystem.ts -B 1 -A 3Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Verify the critical point: does the Lua script execution and callback have transactional guarantees?
# Check if failed callback can somehow trigger re-addition to TTL queue
sed -n '1280,1320p' internal-packages/run-engine/src/run-queue/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 1140
🏁 Script executed:
# Check if there's any watchdog or scheduled job that finds orphaned PENDING runs
fd . internal-packages/run-engine/src/engine/systems -name "*.ts" -type f | xargs grep -l "PENDING\|orphan\|stuck\|recovery"Repository: triggerdotdev/trigger.dev
Length of output: 301
🏁 Script executed:
# Check if there's any background cleanup for orphaned PENDING runs
rg -n "expireRun\|PENDING" --type=ts internal-packages/run-engine/src/engine/index.ts | grep -i "cleanup\|background\|cron\|schedule\|task\|job"Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Look at the test to see if it covers failure scenarios
sed -n '1,50p' internal-packages/run-engine/src/engine/tests/ttl.test.tsRepository: triggerdotdev/trigger.dev
Length of output: 1517
🏁 Script executed:
# Let me verify this critical finding: check the exact atomicity guarantee
# The Lua script removes runs BEFORE callback - is there any compensation?
rg -n "expireRunsBatch\|#ttlExpiredCallback" --type=ts -A 5 internal-packages/run-engine/src/engine/index.ts | head -60Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check if there's any mechanism in place to recover from partial failures
# Look at the full test for TTL expiration and error scenarios
sed -n '1,100p' internal-packages/run-engine/src/engine/tests/ttl.test.ts | tail -50Repository: triggerdotdev/trigger.dev
Length of output: 1612
Runs removed from Redis before database update completes—orphaned if callback fails.
The Lua script atomically removes runs from the TTL sorted set, queue, and concurrency tracking before the callback executes. If expireRunsBatch fails after the Lua removal completes, the runs remain PENDING in the database but are no longer in Redis, with no recovery mechanism. They cannot be rediscovered by the polling system, as there is no scheduled job to find orphaned PENDING runs past their TTL.
Consider either: (1) moving the database update into the Lua script to guarantee atomicity, (2) adding compensation logic to re-add failed runs to the TTL queue, or (3) implementing a background cleanup job to discover and expire orphaned PENDING runs.
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/engine/index.ts` around lines 2245 - 2273,
The `#ttlExpiredCallback` currently calls this.ttlSystem.expireRunsBatch after a
Lua script has already removed runs from Redis, so if expireRunsBatch throws the
DB update fails the runs become orphaned; to fix, add compensation in
`#ttlExpiredCallback` to detect errors from ttlSystem.expireRunsBatch and, on
failure, re-add the affected runIds back into the TTL structures (sorted
set/queue/concurrency tracking) via a new ttlSystem method (e.g.,
ttlSystem.requeueRuns or ttlSystem.restoreRuns) or invoke existing requeue
logic, and/or schedule the runIds for a background cleanup retry; ensure the
added code references `#ttlExpiredCallback`, ttlSystem.expireRunsBatch, and the
Lua removal behavior so the requeue/cleanup is called only when expireRunsBatch
fails.
| /** | ||
| * Applies `defaultMaxTtl` to a run's TTL: | ||
| * - No max configured → pass through as-is. | ||
| * - No TTL on the run → use the max as the default. | ||
| * - Both exist → clamp to the smaller value. | ||
| */ | ||
| #resolveMaxTtl(ttl: string | undefined): string | undefined { | ||
| const maxTtl = this.options.defaultMaxTtl; | ||
|
|
||
| if (!maxTtl) { | ||
| return ttl; | ||
| } | ||
|
|
||
| if (!ttl) { | ||
| return maxTtl; | ||
| } | ||
|
|
||
| const ttlMs = parseNaturalLanguageDurationInMs(ttl); | ||
| const maxTtlMs = parseNaturalLanguageDurationInMs(maxTtl); | ||
|
|
||
| if (ttlMs === undefined || maxTtlMs === undefined) { | ||
| return ttl; | ||
| } | ||
|
|
||
| return ttlMs <= maxTtlMs ? ttl : maxTtl; | ||
| } |
There was a problem hiding this comment.
#resolveMaxTtl — edge case when TTL string is unparseable.
If the user-provided ttl doesn't parse (e.g., "invalid"), parseNaturalLanguageDurationInMs returns undefined, and Line 2296 returns the original unparseable ttl string. This means an invalid TTL silently bypasses the max clamp. This is presumably handled downstream (the run queue / expiry system would ignore or reject an invalid TTL), but it's worth noting.
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/engine/index.ts` around lines 2275 - 2300,
The method `#resolveMaxTtl` currently returns the original unparseable ttl string
when parseNaturalLanguageDurationInMs(ttl) yields undefined, which allows
invalid TTLs to bypass the max clamp; change the logic so that if ttl fails to
parse but the configured options.defaultMaxTtl parses successfully, you return
the parsed max (i.e., maxTtl) to enforce the cap; keep the existing fallback
that if defaultMaxTtl itself is unparseable
(parseNaturalLanguageDurationInMs(maxTtl) === undefined) you return ttl as
before.
| // Update all runs in a single batch | ||
| const now = new Date(); | ||
| const runIdsToExpire = runsToExpire.map((r) => r.id); | ||
|
|
||
| await this.$.prisma.taskRun.updateMany({ | ||
| where: { id: { in: runIdsToExpire } }, | ||
| data: { | ||
| status: "EXPIRED" as TaskRunStatus, | ||
| completedAt: now, | ||
| expiredAt: now, | ||
| // Note: updateMany doesn't support nested writes, so we handle error and snapshots separately | ||
| }, | ||
| }); | ||
|
|
||
| // Create snapshots and set errors for each run (these require individual updates) | ||
| await Promise.all( | ||
| runsToExpire.map(async (run) => { | ||
| const latestSnapshot = run.executionSnapshots[0]!; | ||
| const error: TaskRunError = { | ||
| type: "STRING_ERROR", | ||
| raw: `Run expired because the TTL (${run.ttl}) was reached`, | ||
| }; | ||
|
|
||
| // Update the error field (updateMany can't do JSON fields properly) | ||
| await this.$.prisma.taskRun.update({ | ||
| where: { id: run.id }, | ||
| data: { error }, | ||
| }); | ||
|
|
||
| // Create the snapshot | ||
| await this.$.prisma.taskRunExecutionSnapshot.create({ | ||
| data: { | ||
| runId: run.id, | ||
| engine: "V2", | ||
| executionStatus: "FINISHED", | ||
| description: "Run was expired because the TTL was reached", | ||
| runStatus: "EXPIRED", | ||
| environmentId: latestSnapshot.environmentId, | ||
| environmentType: latestSnapshot.environmentType, | ||
| projectId: latestSnapshot.projectId, | ||
| organizationId: latestSnapshot.organizationId, | ||
| }, | ||
| }); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for TTL batch usage, expiration handling, and any repair/backfill flows
rg -n "expireRunsBatch|runExpired|expiredAt|EXPIRED|ttlSystem" internal-packages/run-engine/src -g '*.ts' -A 2Repository: triggerdotdev/trigger.dev
Length of output: 20150
🏁 Script executed:
# Look for any error handling or retry logic around the updateMany operation
rg -n "\.updateMany|\.update|\.create" internal-packages/run-engine/src/engine/systems/ttlSystem.ts -B 2 -A 5Repository: triggerdotdev/trigger.dev
Length of output: 1526
🏁 Script executed:
# Search for TaskRun model definition and any constraints
fd -name "*.prisma" | xargs rg -n "model TaskRun|model TaskRunExecutionSnapshot" -A 15Repository: triggerdotdev/trigger.dev
Length of output: 12802
🏁 Script executed:
# Check if there are any monitor/repair jobs related to EXPIRED runs
rg -rn "EXPIRED.*repair|repair.*EXPIRED|orphan|partial.*expir|backfill.*ttl" internal-packages/run-engine/src -g '*.ts'Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Understand the full context of the ttlSystem.ts file to see how expireRunsBatch is called
head -n 350 internal-packages/run-engine/src/engine/systems/ttlSystem.ts | tail -n 150Repository: triggerdotdev/trigger.dev
Length of output: 4969
🏁 Script executed:
# Search for orphan handling, repair jobs, or post-expiration validation
rg -i "orphan|repair|backfill|expired.*without|validate.*expired" internal-packages/run-engine/src -g '*.ts'Repository: triggerdotdev/trigger.dev
Length of output: 3287
🏁 Script executed:
# Check if there's error handling or retry logic around expireRunsBatch calls
rg -B 5 -A 10 "expireRunsBatch" internal-packages/run-engine/src -g '*.ts'Repository: triggerdotdev/trigger.dev
Length of output: 14334
🏁 Script executed:
# Check the error handling and logging around the Promise.all
rg -B 5 -A 15 "Promise.all.*runsToExpire.map" internal-packages/run-engine/src/engine/systems/ttlSystem.tsRepository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check if there are any queries that assume EXPIRED runs have snapshots
rg -i "EXPIRED.*snapshot|snapshot.*EXPIRED" internal-packages/run-engine/src -g '*.ts' -A 2Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Look for any error recovery or monitoring for failed TTL operations
rg -B 2 -A 10 "catch.*callbackError\|catch.*expireRunsBatch" internal-packages/run-engine/src -g '*.ts'Repository: triggerdotdev/trigger.dev
Length of output: 51
🏁 Script executed:
# Check error handling in ttlSystem more carefully
sed -n '265,330p' internal-packages/run-engine/src/engine/systems/ttlSystem.tsRepository: triggerdotdev/trigger.dev
Length of output: 2361
🏁 Script executed:
# Check how the TTL callback is invoked and what happens on error
sed -n '2240,2280p' internal-packages/run-engine/src/engine/index.tsRepository: triggerdotdev/trigger.dev
Length of output: 1315
🏁 Script executed:
# Look for any validation or assumptions about EXPIRED runs having snapshots
grep -r "executionSnapshots" internal-packages/run-engine/src -g --include="*.ts" | grep -i expired | head -20Repository: triggerdotdev/trigger.dev
Length of output: 179
Risk of partial-expire state if per-run updates fail after batch status update.
The updateMany operation marks all runs as EXPIRED atomically, but the subsequent Promise.all that creates snapshots, sets errors, and emits events can fail partially. If any per-run operation fails (snapshot creation, waitpoint completion, event emission), the Promise.all rejects—but the runs are already marked EXPIRED in the database. Since these runs no longer match the PENDING filter, they won't be retried on the next TTL poll, leaving them stuck in an incomplete state without snapshots or errors.
While the batch size is bounded at 100 runs (default), this doesn't prevent the issue. The error handling in #ttlExpiredCallback only logs failures without recovery. Consider using per-run transactional updates via prisma.$transaction() to atomically update status, error, and snapshot together for each run, ensuring consistency.
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/engine/systems/ttlSystem.ts` around lines
248 - 291, The batch update (taskRun.updateMany) followed by per-run updates
(taskRun.update and taskRunExecutionSnapshot.create) can leave runs marked
EXPIRED without snapshots/errors if any per-run op fails; change
ttlExpiredCallback to perform each run's status change, error write, and
snapshot creation inside a single per-run prisma.$transaction() (instead of the
global updateMany + Promise.all) so that setting status to EXPIRED,
completedAt/expiredAt, writing the error, and creating taskRunExecutionSnapshot
happen atomically for each run (operate on runsToExpire entries and replace uses
of taskRun.updateMany, taskRun.update, and taskRunExecutionSnapshot.create
accordingly), and ensure transaction failures are surfaced/logged so those runs
remain PENDING for the next TTL poll.
| if messagePayload then | ||
| -- Update concurrency | ||
| redis.call('ZREM', queueKey, messageId) | ||
| redis.call('ZREM', envQueueKey, messageId) | ||
| redis.call('SADD', queueCurrentConcurrencyKey, messageId) | ||
| redis.call('SADD', envCurrentConcurrencyKey, messageId) | ||
|
|
||
| -- Add to results | ||
| table.insert(results, messageId) | ||
| table.insert(results, messageScore) | ||
| table.insert(results, messagePayload) | ||
|
|
||
| dequeuedCount = dequeuedCount + 1 | ||
| -- Parse the message to check for TTL expiration | ||
| local messageData = cjson.decode(messagePayload) | ||
| local ttlExpiresAt = messageData and messageData.ttlExpiresAt | ||
|
|
||
| -- Check if TTL has expired | ||
| if ttlExpiresAt and ttlExpiresAt <= currentTime then | ||
| -- TTL expired - remove from queues but don't add to results | ||
| redis.call('ZREM', queueKey, messageId) | ||
| redis.call('ZREM', envQueueKey, messageId) | ||
| redis.call('DEL', messageKey) | ||
|
|
||
| -- Remove from TTL set if provided | ||
| if ttlQueueKey and ttlQueueKey ~= '' then | ||
| -- Construct TTL member: queueKey|runId|orgId | ||
| local ttlMember = queueName .. '|' .. messageId .. '|' .. (messageData.orgId or '') | ||
| redis.call('ZREM', ttlQueueKey, ttlMember) | ||
| end | ||
| -- Don't add to results - this run is expired | ||
| else | ||
| -- Not expired - process normally | ||
| redis.call('ZREM', queueKey, messageId) | ||
| redis.call('ZREM', envQueueKey, messageId) | ||
| redis.call('SADD', queueCurrentConcurrencyKey, messageId) | ||
| redis.call('SADD', envCurrentConcurrencyKey, messageId) | ||
|
|
||
| -- Remove from TTL set if provided (run is being executed, not expired) | ||
| if ttlQueueKey and ttlQueueKey ~= '' and ttlExpiresAt then | ||
| local ttlMember = queueName .. '|' .. messageId .. '|' .. (messageData.orgId or '') | ||
| redis.call('ZREM', ttlQueueKey, ttlMember) | ||
| end | ||
|
|
||
| -- Add to results | ||
| table.insert(results, messageId) | ||
| table.insert(results, messageScore) | ||
| table.insert(results, messagePayload) | ||
|
|
||
| dequeuedCount = dequeuedCount + 1 | ||
| end |
There was a problem hiding this comment.
TTL-expired runs silently discarded during dequeue are never reported to the application layer.
When the dequeue Lua script encounters a message with an expired TTL (Lines 2716–2727), it removes the message from the queue, env queue, message key, and the TTL sorted set — but never returns it as an expired run. This means:
- The TTL consumer will never pick it up (already removed from the TTL sorted set).
- The
ttlSystem.callbackis never invoked for this run. - The run remains in
QUEUEDstatus in the database permanently — orphaned.
The dequeue path needs to either:
- Return expired run info separately so the caller can invoke the expiration callback, or
- Leave expired messages in the TTL sorted set (only remove from the normal queue/env queue) so the TTL consumer can handle them, or
- Not evict TTL-expired messages during dequeue at all, and let the TTL consumer be the sole owner of expiration.
Option 3 is simplest and avoids the race between dequeue and TTL consumer entirely. The TTL consumer already handles cleanup atomically.
🤖 Prompt for AI Agents
In `@internal-packages/run-engine/src/run-queue/index.ts` around lines 2710 -
2748, The dequeue Lua branch that handles TTL expiration (checks using
ttlExpiresAt <= currentTime) currently removes the message from queueKey,
envQueueKey, messageKey and ttlQueueKey but never reports the expired run to the
application; to fix this, stop evicting TTL-expired messages in the dequeue
path: modify the ttlExpiresAt <= currentTime branch so it does not remove the
message or its TTL membership (i.e., do not call ZREM on queueKey/envQueueKey or
DEL messageKey or ZREM ttlQueueKey) and simply skip adding the run to results
(leaving the message for the TTL consumer to handle), ensuring that results,
dequeuedCount and SADD calls are untouched for the non-expired branch; locate
and change the code around the symbols ttlExpiresAt, currentTime, queueKey,
envQueueKey, ttlQueueKey, messageKey, results and dequeuedCount.
Summary
This PR implements environment-level queue size limits to prevent runaway queues from overwhelming Redis, along with several related improvements.
Environment Queue Limits
Prevents any single environment from overwhelming the shared Redis cluster by enforcing queue size limits at trigger time.
MAXIMUM_ENV_QUEUE_SIZEenvironment variable with sensible defaultsFix: batchTriggerAndWait stuck when items skipped
Fixed a bug where
batchTriggerAndWaitwould get stuck forever when some items were skipped due to queue limits.Root cause: When items were skipped during batch streaming:
runCountto SDKFix: Server now returns actual
runCountin Phase 2 response, SDK uses this forwaitForBatch.New TTL System
Improved TTL (time-to-live) handling for run expiration with better test coverage.
Lazy Waitpoint Creation
Better idempotency handling through lazy waitpoint creation - waitpoints are now created on-demand rather than eagerly.
Run Queue Improvements
Various improvements to the run queue system with expanded test coverage.
Test plan
batchTriggerAndWaitwith queue limits causing skipped itemsrefs TRI-7167
🤖 Generated with Claude Code