fix(security): KB fileUrl LFI, MCP/Agiloft SSRF pinning, form OTP, KB authz#4639
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
PR SummaryHigh Risk Overview Centralizes OTP storage/attempt tracking. Moves chat OTP logic into a shared Closes a KB document ingestion LFI vector. Tightens Pins and validates outbound connections to reduce SSRF/TOCTOU risk. MCP SSRF validation now returns a resolved IP and, when available, the MCP client uses an undici-based pinned fetch to prevent DNS rebinding; Agiloft attach/retrieve routes similarly resolve once and pin all subsequent requests; Grafana update tools now validate Hardens knowledge base workspace authorization. Adds Updates API validation baseline and adds extensive unit tests across the new and hardened paths. Reviewed by Cursor Bugbot for commit 21a93a5. Configure here. |
Greptile SummaryThis PR delivers a bundled security hardening across five surfaces: LFI prevention on Knowledge Base file upload, KB permission errors and transaction hardening, DNS-rebinding (SSRF) pinning for MCP and Agiloft, basic URL validation for Grafana tools, and a new email-OTP gate for form deployments with a shared
Confidence Score: 5/5Safe to merge; the Grafana URL validation gap does not regress from the pre-PR state and is noted for a follow-up improvement. The five hardened surfaces (KB LFI, KB authz, MCP DNS pinning, Agiloft DNS pinning, form OTP) are all correctly implemented. The Grafana tools move from no validation to basic format checking — an improvement, not a regression — but do not reach the DNS-pinned level of the Agiloft fix. All previous review comments were addressed in follow-up commits. The OTP module's Redis Lua path, DB optimistic-lock fail-closed path, and rate-limiting are sound. apps/sim/tools/grafana/update_alert_rule.ts and apps/sim/tools/grafana/update_dashboard.ts — both validate baseUrl with the synchronous format-only check and still use plain fetch; a follow-up to add DNS resolution and IP pinning would bring them in line with the Agiloft pattern.
|
| Filename | Overview |
|---|---|
| apps/sim/lib/core/security/otp.ts | New shared OTP module: correct Redis Lua atomic increment, DB optimistic-lock retry with fail-closed exhaustion, per-kind key namespacing, and clean encode/decode for the code:attempts format. |
| apps/sim/lib/mcp/pinned-fetch.ts | New pinned-fetch implementation using undici's typed fetch export with a createPinnedLookup dispatcher; correctly bridges DOM/undici type gap via documented cast and preserves hostname for TLS SNI. |
| apps/sim/lib/mcp/domain-check.ts | Returns resolved IP from validateMcpServerSsrf for downstream pinning; loopback is now blocked on hosted environments; DNS failure path is now distinct from SSRF-block path. |
| apps/sim/tools/agiloft/utils.server.ts | New server-only Agiloft helpers: resolves DNS once via validateUrlWithDNS, then uses secureFetchWithPinnedIP for login/logout/attach/retrieve, closing the TOCTOU SSRF window. |
| apps/sim/tools/grafana/update_alert_rule.ts | Adds validateExternalUrl before the Grafana API call — blocks IP-literal SSRF but does not resolve DNS, leaving DNS-rebinding attacks open (inconsistent with the Agiloft/MCP pattern). |
| apps/sim/tools/grafana/update_dashboard.ts | Same as update_alert_rule — synchronous format-only validation with plain fetch; DNS rebinding SSRF not mitigated. |
| apps/sim/lib/knowledge/service.ts | KB update wrapped in a SELECT FOR UPDATE transaction; workspace-change permission check added via actorUserId; KnowledgeBasePermissionError distinguishes auth failures from 500s. |
| apps/sim/lib/knowledge/documents/document-processor.ts | Removes fs.readFile fallback; all non-data:/http(s):// paths now throw; case-insensitive regex checks applied consistently across downloadFileForBase64 and parseWithFileParser. |
| apps/sim/app/api/form/[identifier]/otp/route.ts | New form OTP route mirrors chat OTP pattern precisely; IP and email rate limiting, allowlist check before OTP store/compare, generic 500 messages, and auth cookie set on success. |
| apps/sim/lib/api/contracts/knowledge/shared.ts | Adds knowledgeDocumentFileUrlSchema with case-insensitive data: and https?:// guards; correctly applied at the Zod validation boundary. |
| apps/sim/app/form/[identifier]/components/email-auth.tsx | New email-auth UI component with two-step flow (email → OTP), countdown resend timer, 6-digit OTP auto-submit, and clear error messaging. |
Reviews (2): Last reviewed commit: "fix(mcp): annotate undici/DOM type-bridg..." | Re-trigger Greptile
…haust - Chat/form OTP routes: replace `error.message || fallback` with generic `Failed to process request` in 500 responses (logger still captures detail). - otp.ts incrementOTPAttempts DB path: on MAX_RETRIES exhaustion, delete the verification row and return `'locked'` instead of trusting a possibly- undercounted final read. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace `globalThis.fetch` + double-cast with `undici.fetch` so the `dispatcher` option is part of the real type contract. This guarantees pinning won't silently break if a future runtime swaps the underlying fetch implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tool config files are statically reachable from the client bundle (via
tools/registry.ts → tools/{service}/index.ts). Importing
`@/lib/core/security/input-validation.server` from these files pulled
`node:dns/promises` into the Turbopack client bundle and broke the build.
Split agiloft utils into client-safe (`utils.ts`, plain fetch + sync
`validateExternalUrl`) and server-only (`utils.server.ts`, DNS-pinned
variants). Routes that need TOCTOU protection import the pinned helpers;
the executor-side tool path falls back to sync URL validation (matches
the supabase precedent and pre-PR baseline).
Grafana update tools likewise switch from `secureFetchWithValidation`
(server-only) to inline sync `validateExternalUrl` + plain fetch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Boundary schema accepted uppercase schemes (e.g. HTTPS://, DATA:) via the
case-insensitive http regex, but the processor's case-sensitive
startsWith('data:') / startsWith('http') / startsWith('https://') checks
rejected them with a confusing "Unsupported fileUrl scheme" error.
Aligns processor checks to the schema using case-insensitive regex per
RFC 3986 §3.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Strict audit was failing on two new `as unknown as` casts in pinned-fetch.ts. They bridge DOM `RequestInit`/`Response` ↔ undici equivalents (structurally compatible at runtime since Node's global fetch is undici) and are required to satisfy the FetchLike contract. Annotate so they count as documented exemptions instead of new violations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
abdc919 to
21a93a5
Compare
|
@greptile |
|
@cursor review |
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 21a93a5. Configure here.
| form: { | ||
| redisKey: (email: string, deploymentId: string) => `form-otp:${email}:${deploymentId}`, | ||
| dbIdentifier: (email: string, deploymentId: string) => `form-otp:${deploymentId}:${email}`, | ||
| }, |
There was a problem hiding this comment.
Form OTP key component ordering inconsistent across storage backends
Low Severity
The form OTP key format has reversed component ordering between Redis (form-otp:${email}:${deploymentId}) and DB (form-otp:${deploymentId}:${email}). For chat, this inconsistency is preserved for backward compatibility with in-flight OTPs (documented in the comment above). But form is an entirely new key format with no legacy data — there's no reason to replicate the asymmetry. This makes the code harder to reason about and could cause confusion when debugging OTP issues or if a deployment ever switches storage backends mid-flight.
Reviewed by Cursor Bugbot for commit 21a93a5. Configure here.
* fix(knowledge): require write access for batch chunk operations The PATCH /api/knowledge/[id]/documents/[documentId]/chunks handler performs enable/disable/delete operations but authorized callers with only read-level access (checkDocumentAccess). This let read-only workspace members destroy or disable indexed chunks. Switch to checkDocumentWriteAccess (write/admin required), matching the sibling POST/PUT/DELETE chunk mutation endpoints. * fix(env): restrict decrypted workspace env vars to secret admins GET /api/workspaces/:id/environment returned decrypted workspace environment variables to any member, including read-only collaborators, leaking API tokens, database URLs, and other secrets. Mask workspace variable values for non-admin viewers while preserving the variable names, so editor autocomplete and conflict detection keep working. A value is revealed only when the caller is a credential admin of that key, or — for legacy keys with no per-secret ACL — holds workspace admin permission. This mirrors the per-key edit gating already enforced by PUT/DELETE: if you can administer a secret, you can read it. Personal variables and execution-time resolution are unchanged. * fix(files): block cross-tenant deletion via client-controlled context POST /api/files/delete trusted a client-supplied `context`, letting any authenticated user delete another tenant's file by naming an arbitrary key with `context: "og-images"`. verifyFileAccess() short-circuited the three public contexts (profile-pictures, og-images, workspace-logos) to `true` before any ownership/requireWrite check. - Derive the storage context strictly from the trusted key prefix in the delete route; reject a supplied `context` that disagrees with the key. - Gate the public-context short-circuit to reads only. Destructive ops (requireWrite) now prove ownership via verifyPublicAssetWriteAccess: workspace-logos require write/admin on the bound workspace, profile-pictures require an exact owner match, og-images always deny. Reads of public assets are unchanged. * fix(telegram): verify X-Telegram-Bot-Api-Secret-Token on inbound webhooks Telegram triggers accepted any forged update from anyone who knew the webhook URL path: verifyAuth was a no-op that always returned null, and setWebhook registered no secret_token. Generate a per-webhook secret in createSubscription, register it with Telegram as secret_token, and persist it to providerConfig. verifyAuth now fails closed — rejects when no token is configured, when the X-Telegram-Bot-Api-Secret-Token header is absent, or when it does not match via constant-time safeCompare. * fix(security): pin DNS for Agiloft directExecution and Grafana update tools The Agiloft directExecution tools (read/create/search/update/delete/lock/ saved_search/select/get_choice_line_id/remove_attachment/attachment_info) and the Grafana update_dashboard/update_alert_rule postProcess hooks issued outbound HTTP to a fully user-controlled host (instanceUrl/baseUrl) via the global fetch(), guarded only by the synchronous validateExternalUrl() — which never resolves DNS, so a hostname resolving to an internal/reserved IP passed validation (SSRF). Route all of these through the codebase's standard SSRF-safe path: - Agiloft: moved executeAgiloftRequest into utils.server.ts where the existing pinned helpers live. It now resolves+validates the instance URL once and pins every hop (login, operation, logout) to that IP via secureFetchWithPinnedIP. The 11 tool configs now import it from utils.server; URL builders stay in the client-safe utils.ts. - Grafana: the postProcess POST/PUT now uses validateUrlWithDNS + secureFetchWithPinnedIP, matching the already-pinned initial GET. This completes the Agiloft SSRF pinning started in #4639 (which covered the attach/retrieve API routes) by closing the directExecution path, and extends the same guard to the Grafana update tools. * fix(api): enforce workspace allowPersonalApiKeys policy on v1 surface The external v1 API authenticated API keys without evaluating the per-workspace allowPersonalApiKeys setting, so a personal API key could read and mutate a workspace's resources (workflows, tables, files, knowledge, logs) even when the workspace had explicitly disabled personal keys. The same control is already enforced on the workflow-execution surface. Enforce the policy in checkWorkspaceScope (covering validateWorkspaceAccess too): reject personal keys with 403 when the workspace has allowPersonalApiKeys=false. checkWorkspaceScope becomes async; all v1 route callsites updated to await it. * fix(billing): close usage-cap admission race with atomic reservation The server-side usage-limit gate read already-recorded cost, but cost is only written when an execution finishes. A burst of concurrent executions all observed the same pre-burst usage, all passed the cap, and all ran — collectively spending far past the limit before any cost landed in the ledger (free-tier abuse / hard-cap defeat). manual/chat triggers also skip rate limiting, removing the only throttle. Add an atomic check-then-reserve admission step (Redis Lua) that bounds in-flight, un-costed executions per billing entity by both a per-plan concurrency cap and remaining usage headroom, so recordedUsage + reservedSlots * estimate <= limit always holds. The slot is released at execution completion via LoggingSession (skipped on pause; TTL self-heals crashes). Runs for all trigger types, covering the previously-unthrottled manual/chat paths. Fails open when billing is disabled or Redis is unavailable, matching the rate limiter — a Redis blip can't turn into an execution outage, and the recorded-usage gate still runs. * fix(workflows): validate folderId belongs to workflow's workspace on create/update/reorder Reject a folderId that references a folder in a different workspace (or an archived/non-existent folder) before writing it to workflow.folderId. Previously create, update, and reorder only checked workspace permission on the workflow and the folder's lock status, never that the folder lived in the workflow's own workspace, allowing a dangling cross-workspace folder reference. Adds isFolderInWorkspace/assertFolderInWorkspace + FolderNotFoundError to @sim/workflow-authz (mirroring assertTargetFolderMutable in the duplicate path), enforced in performCreateWorkflow, performUpdateWorkflow, and the reorder route. Invalid folders now return 400. * fix(folders): validate parentId against workspace on create/update/reorder Folder write endpoints accepted a caller-supplied parentId and persisted it without verifying the parent existed in the same workspace, and the create and reorder paths had no cycle guard. A workspace member with write access could reparent a folder to a foreign-workspace folder, a non-existent id, or (via reorder) into a cycle, hiding the folder and its workflows from all members. - performCreateFolder: reject self-parenting and validate the parent exists in the workspace and is not archived (mirrors the duplicate route). - performUpdateFolder: add the same workspace/archived parent check alongside the existing circular-reference guard. - folders/reorder: validate every target parent against the workspace, detect cycles in the resulting parent graph (catches batch cycles), and normalize falsy parentId to null to prevent orphaning. Adds tests for cross-workspace parent rejection and batch-cycle rejection. * chore(knowledge): drop non-TSDoc inline comments from chunks route * fix(webhooks): fail closed when HMAC signing secret is not configured Inbound webhook signature verification failed open for HMAC providers (GitHub, Intercom, Jira, JSM, Confluence, Cal.com, Notion, Greenhouse, Typeform, Fireflies, Circleback): when no signing secret was stored, verifyAuth returned null and the workflow executed on a fully attacker-controlled body. Reject these deliveries with 401 instead, matching the fail-closed Stripe/WhatsApp/Vercel providers. Run provider reachability/verification handshakes (Notion verification_token, Grain/Intercom ping) ahead of auth so the pre-secret setup handshake still completes — those return a canned 200 without executing the workflow, and real event payloads fall through to fail-closed verification. Update the trigger secret-field copy to state the secret is required for deliveries to be accepted (was misleadingly marked optional). * style(files): trim verbose inline comments on delete authorization fix * fix(auth): close account-enumeration oracle on email sign-up The custom before-hook pre-check threw a distinguishing 422/USER_ALREADY_EXISTS for already-registered emails, letting an unauthenticated attacker enumerate accounts — defeating better-auth's own OWASP enumeration protection (active under requireEmailVerification). Remove the pre-check and rely on better-auth's generic duplicate-sign-up response, wiring: - onExistingUserSignUp: notify the real account owner out-of-band, mirroring the privacy-preserving forget-password flow. - customSyntheticUser: include admin (role/banned/banReason/banExpires) and Stripe (stripeCustomerId, billing-gated) user fields so the fake response shape is byte-identical to a real new-user response. Adds an ExistingAccountEmail template + 'existing-account' subject. * style(tools): drop non-TSDoc inline comments from Grafana/Agiloft SSRF tools * chore(api): trim extraneous inline comments in v1 logs/files routes Remove a redundant size annotation and two verbose multi-line materialization comments whose intent is already clear from the code. Load-bearing comments (race-condition and key-translation notes) kept. * fix(billing): exclude table-cell dispatch from admission reservation Table-cell dispatch is row-bounded, async rate-limited, and already surfaces a graceful usage state. Applying the in-flight concurrency reservation there turned its 429 into a hard cell error on a normal >15-concurrent-cell run (only 402 was handled gracefully). Skip the reservation for that surface via a new skipConcurrencyReservation option (the usage-cost cap is still enforced), and tidy the reservation comments to TSDoc. * fix(chat): rate-limit and constant-time password auth for public chats Password-protected public chat (POST /api/chat/[identifier]) had no throttling on the password check and compared with a non-constant-time !==, allowing unlimited brute-force and per-character timing leaks. - Add per-IP rate limiting (10 / 15min) to the password branch of validateChatAuth, mirroring the OTP/SSO endpoints; return 429 with Retry-After. Only explicit unlock attempts consume tokens — message sends carry no password and ride the auth cookie. - Replace password !== decrypted with safeCompare. - Fails open on rate-limiter storage errors; no availability regression. * fix(security): cap JSON request body size and gate public chat endpoint The shared parseJsonBody helper (behind parseRequest, used by nearly every contract route) read request bodies with no size limit, buffering the full body into memory before validation. The unauthenticated public deployed-chat endpoint reached this sink with no admission gate, enabling an anonymous memory-exhaustion DoS. - parseRequest/parseJsonBody now enforce a byte cap via a size-limited stream read (content-length precheck + streamed cap), returning 413. Default is API_MAX_JSON_BODY_BYTES (50 MB), overridable per route via maxBodyBytes. Decoding uses TextDecoder to match request.json() BOM handling. - Public chat POST is wrapped with the admission gate (tryAdmit) and passes an explicit CHAT_MAX_REQUEST_BYTES (20 MB) cap. - Chat body contract gains .max() bounds on input, password, conversationId, file data/name/type, and files array length. - Admin bulk workspace import opts into a higher 100 MB cap to avoid regressing large multi-workflow imports. * fix(chat): rate-limit and constant-time password auth for public chats Password-protected public chat (POST /api/chat/[identifier]) had no throttling on the password check and compared with a non-constant-time !==, allowing unlimited brute-force and per-character timing leaks. - Add per-IP rate limiting (10 / 15min) to the password branch of validateChatAuth, mirroring the OTP/SSO endpoints; return 429 with Retry-After. Only explicit unlock attempts consume tokens — message sends carry no password and ride the auth cookie. - Replace password !== decrypted with safeCompare. - Fails open on rate-limiter storage errors; no availability regression. Reinstates the fix reverted by an intervening commit. * fix(billing): never block a lone execution on usage headroom The admission reservation tapered allowed concurrency by remaining usage headroom. With under one credit of headroom left (but not yet over the cap), floor(headroom / estimate) hit zero and rejected even a single, zero-concurrency execution — stricter than the recorded-usage gate, which would have allowed that last run, and with a misleading "too many concurrent executions" message. Floor the headroom term at 1 so a lone execution is governed only by the cost gate; concurrency above the first slot still tapers with headroom. * refactor(env): document workspace env masking, drop inline comments Extract the workspace-env value masking into a TSDoc-documented maskWorkspaceEnvForViewer helper and remove the redundant inline comments from the GET handler and its test. No behavior change. * refactor(env): convert PUT/DELETE authz comments to TSDoc Move the tiered-authorization rationale for the workspace env upsert and delete handlers into TSDoc blocks and drop the inline comments. No behavior change. * fix(telegram): keep legacy webhooks working via Telegram source-IP fallback The secret-token check rejected every webhook registered before secret_token support, breaking live triggers until re-saved. Fall back to verifying the request originates from Telegram's published webhook IP ranges when no secret is configured, so existing triggers keep firing with no re-save or migration while forged updates from arbitrary hosts are still rejected. Webhooks with a registered secret continue to use strict constant-time token verification. * fix(chat): restore constant-time password auth and IP rate limit A billing commit (ac56525) reverted the public-chat auth hardening as collateral, leaving HEAD with a timing-oracle password comparison (password !== decrypted) and no per-IP brute-force rate limit. Restore safeCompare and the password-attempt rate limiter, and re-add the 429 test. * revert(webhooks): undo trigger auth hardening pending compat plan Reverts the Telegram inbound-token verification (3ed97a4, 41f133a) and the HMAC fail-closed change (5b6cae9). Production data shows ~79 live webhooks have no signing secret configured (63 GitHub, 9 Fireflies, 3 Jira, 2 Circleback, 1 Confluence, 1 Cal.com), so failing closed would 401 them. Restoring fail-open behavior until a backwards-compatible rollout (grandfather existing secretless webhooks / migration) is designed. Other security fixes on this branch are unaffected. * test(chat): make RateLimiter mock a constructable class The arrow-function mockImplementation form was not reliably constructable in the full suite run (`new RateLimiter()` threw "is not a constructor"), though it passed in isolation. Switch to the class-based mock used by the sibling OTP/speech route tests. * fix(billing): release admission slot on pre-execution aborts; cluster-safe release Addresses PR review on the usage-cap admission reservation: - Slot leak: the reservation taken at the end of preprocessing was only released when the LoggingSession finalized. The execute route's pre-execution exits (client cancel, workspace/API-key guards) returned without finalizing a session, leaking the slot until its TTL and wrongly throttling later runs. Release explicitly on those paths; executions that start are still released via session finalization. - Release is now cluster-safe: replaced the Lua script that rebuilt the in-flight key from the pointer value (a key not declared in KEYS, which silently breaks Redis Cluster slot routing) with discrete single-key GETDEL + ZREM commands. * improvement(files): log missing owner metadata distinctly on profile-picture delete deny Per PR review: when a profile-picture delete is denied, distinguish a missing owner record (no userId metadata) from a genuine ownership mismatch so the fail-closed denial is diagnosable. Behavior unchanged — both still deny. * fix(billing): release admission slot when async enqueue fails If queueing the background workflow job throws, no job runs and no LoggingSession finalizes, so the admission slot reserved during preprocessing would leak until its TTL. Release it before returning 500. * fix(api): make body-size caps NaN-safe and raise chat input/attachment limits - DEFAULT_MAX_JSON_BODY_BYTES and CHAT_MAX_REQUEST_BYTES now fall back to hardcoded defaults (50 MB / 220 MB) when the env value is missing or non-numeric, so a misconfig can't silently produce a NaN cap that never rejects. - Raise CHAT_MAX_REQUEST_BYTES default to 220 MB to cover 15 base64 file attachments, and MAX_CHAT_INPUT_CHARS to 1,000,000. - Minor: tidy use-inline-rename onSave type; drop two redundant test comments. * fix(hooks): restore void return in useInlineRename onSave type A prior commit changed onSave's return type from `void | Promise<unknown>` to `undefined | Promise<unknown>`, which broke the build: callbacks that return nothing (table-grid column rename, table header rename) infer a `void` return, which is not assignable to `undefined`. Restore the `void` union so both fire-and-forget and Promise-returning callbacks type-check. * fix(billing,api): release chat reservation slot on early exit; preserve 413 on oversized import - Chat route: preprocessExecution reserves a billing concurrency slot, but the post-preprocess early exits (missing workspaceId, execution-setup failure) returned without releasing it, leaking the slot until TTL and wrongly throttling later runs. Release explicitly on those paths (idempotent), mirroring the workflows execute route. - Admin import route: an oversized JSON body now returns the real 413 from parseJsonBody instead of being remapped to a 400; invalid JSON still 400s. * fix(icons): make Infisical icon black for contrast; regenerate docs The Infisical mark rendered near-white on its yellow block background and was barely visible; switch its fill from currentColor to #000000 (matching the hardcoded-fill pattern of sibling brand icons). Sync the docs icon copy and pick up a stale servicenow doc regeneration. * fix(billing): release reserved slot on execute-route 503 and setup throw After preprocessExecution reserves a billing concurrency slot, the streaming path could exit without releasing it: the 503 return when initializeExecutionStreamMeta fails, and any throw during stream setup (caught by the outer handler, which only returned 500). Both left the slot held until TTL, wrongly throttling unrelated runs. Release on the 503 path and in the outer catch (executionId hoisted so the catch can see it; release is idempotent and a no-op when no slot was reserved). * fix(icons): make Linkup icon black for contrast The Linkup mark rendered with currentColor (near-white on its block background); switch its fill to #000000 for legibility, matching the Infisical fix. Docs icon copy synced via generate-docs. * fix(billing): release reserved slot if inline async job never starts In the inline (single-process) async path, if jobQueue.startJob threw before executeWorkflowJob ran, no LoggingSession finalized and the reserved billing slot was held until TTL. Release it in the fire-and-forget catch (idempotent; a no-op when the job already finalized and released). The queued-worker path and all in-job outcomes already release via the job's LoggingSession finalize.


Summary
Bundled security hardening across multiple surfaces:
POST /api/knowledge/[id]/documents/upsertpreviously accepted any non-empty string forfileUrl, which the background processor then passed tofs.readFile/parseFile, allowing authenticated arbitrary local file reads (e.g./app/.env,/etc/passwd). Now gated at the boundary viaknowledgeDocumentFileUrlSchema(onlydata:URIs orhttp(s)://URLs); defense-in-depth indocument-processor.tsthrows on any other scheme.KnowledgeBasePermissionError, tightens permission checks, and wraps multi-step KB mutations in transactions.pinned-fetch.ts(undiciAgentwith a pinnedlookup) so the host the policy validated is the host actually connected to.resolveAgiloftInstance+ pinned fetch forattach/retrieveroutes.update_alert_rule/update_dashboardroute throughsecureFetchWithValidation.POST /api/form/[identifier]/otproute +email-auth.tsxUI; shared OTP module underlib/core/security/otp.tskeyed byDeploymentKind, with the chat OTP route refactored to consume it.Test plan
bun run check:api-validation:strictpasses/api/files/serve/...rows have zerocompletedstatus (already failing under the oldfs.readFilepath); change is strictly a clearer error, not a regression🤖 Generated with Claude Code