Skip to content

fix(uploads): authorize internal file URLs before download#5049

Merged
icecrasher321 merged 6 commits into
stagingfrom
fix/file-url-access-check
Jun 15, 2026
Merged

fix(uploads): authorize internal file URLs before download#5049
icecrasher321 merged 6 commits into
stagingfrom
fix/file-url-access-check

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

  • downloadFileFromUrl treated any URL containing /api/files/serve/ as trusted-internal and read the object straight from storage by key, skipping the verifyFileAccess check every other resolution path in the file runs.
  • Reachable during workflow execution via file[] inputs (type: 'url'), so an authenticated user could read storage objects across tenants by supplying a storage key.
  • Now thread the caller's userId into downloadFileFromUrl and run verifyFileAccess(key, userId, undefined, context, false) on the resolved key before reading bytes; fail closed when no userId is present.
  • Updated all callers (execution file inputs, tool file outputs, KB ingestion). Webhook and chat inputs already thread userId via processExecutionFiles.
  • External/presigned URLs are unchanged (still DNS/SSRF-guarded); only the internal branch gained the check, reusing the same authorization the /api/files/serve route already enforces.

Type of Change

  • Bug fix (security)

Testing

Tested manually. Typecheck, Biome, check:api-validation, and existing upload/KB tests pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

downloadFileFromUrl treated any URL containing /api/files/serve/ as
trusted-internal and read the object straight from storage by key with
no access check, while every other resolution path in the file calls
verifyFileAccess. Reachable during workflow execution via file[] inputs
(type: 'url'), letting an authenticated user read arbitrary storage
objects across tenants by supplying a storage key.

Thread the caller's userId into downloadFileFromUrl and run
verifyFileAccess(key, userId, undefined, context, false) on the resolved
key before downloadFile; fail closed when no userId is present. Update
all callers (execution file inputs, tool file outputs, KB ingestion);
webhook and chat inputs already thread userId via processExecutionFiles.
@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 15, 2026 11:35pm

Request Review

@cursor

cursor Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Security fix for unauthorized cross-tenant storage reads via internal serve URLs; changes authorization on a central download helper used in execution and knowledge-base ingestion.

Overview
Closes a cross-tenant read path where downloadFileFromUrl treated any URL containing /api/files/serve/ as trusted and read storage by key without verifyFileAccess—reachable from workflow file[] URL inputs and similar server-side download paths.

downloadFileFromUrl now takes an options object (including userId). For internal URLs it fails closed without a principal, resolves the key with extractStorageKey, derives context via inferContextFromKey (not caller-controlled ?context=), runs verifyFileAccess, then reads from storage. External URLs are unchanged (DNS/SSRF + pinned fetch).

isInternalFileUrl only matches the serve marker in the path (not query/fragment smuggling); tests cover edge cases.

Callers (execution file inputs, tool file outputs, KB document/OCR/parser paths) pass userId through. KB indexing uses billingUserId (uploader or workspace billed account) for file authorization so internal fileUrls are not checked against the KB owner alone.

Reviewed by Cursor Bugbot for commit bb1d384. Configure here.

Comment thread apps/sim/lib/uploads/utils/file-utils.server.ts
@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR closes a cross-tenant file access vulnerability where downloadFileFromUrl treated any URL containing /api/files/serve/ as trusted-internal and read from storage without an authorization check. The fix adds a userId requirement and runs verifyFileAccess (the same check the /api/files/serve route uses) before any bytes are read, and fails closed when no userId is present.

  • file-utils.server.ts: Signature refactored to an options object (DownloadFileFromUrlOptions); verifyFileAccess call inserted in the internal-URL branch gated on a present userId.
  • files.ts / file-tool-processor.ts: The available userId / context.userId is forwarded to downloadFileFromUrl at each call site that resolves file inputs.
  • document-processor.ts: userId is threaded through the full document-parsing call chain (parseDocumentparseWithAzureMistralOCR / parseWithFileParserparseHttpFiledownloadFileWithTimeout) to reach the now-guarded downloadFileFromUrl.

Confidence Score: 4/5

The authorization gap is correctly closed and all affected call sites are updated; the remaining notes are about defensive hardening, not functional breakage.

The fix is well-scoped and correct across all four files. The two P2 notes in file-utils.server.ts are about defensive hardening — the isInternalFileUrl substring match and userId-check ordering — neither of which blocks the primary security goal of the PR.

apps/sim/lib/uploads/utils/file-utils.server.ts — the isInternalFileUrl substring match and the userId-check ordering are both worth a second look.

Important Files Changed

Filename Overview
apps/sim/lib/uploads/utils/file-utils.server.ts Core security fix: adds DownloadFileFromUrlOptions interface and threads userId + verifyFileAccess check into the internal URL branch of downloadFileFromUrl; signature changed from positional args to options object (all callers updated). Two minor concerns: userId is checked after parseInternalFileUrl rather than before, and isInternalFileUrl uses a raw substring match that can classify external URLs as internal.
apps/sim/lib/execution/files.ts Passes the already-available userId argument into downloadFileFromUrl for type: 'url' file inputs; change is minimal and correct.
apps/sim/executor/utils/file-tool-processor.ts Passes context.userId (optional, but fail-closed by design) to downloadFileFromUrl when a tool output file is resolved by URL; all other code paths unchanged.
apps/sim/lib/knowledge/documents/document-processor.ts Threads userId through the full document-parsing call chain; also updates downloadFileWithTimeout and downloadFileForBase64 signatures to the new options object. parseWithMistralOCR already carried userId before this PR.

Sequence Diagram

sequenceDiagram
    participant Caller as Caller (execution / KB / tool)
    participant DL as downloadFileFromUrl
    participant Auth as verifyFileAccess
    participant Store as Storage Service
    participant DNS as DNS/SSRF Guard + secureFetch

    Caller->>DL: "downloadFileFromUrl(url, { userId })"

    alt isInternalFileUrl(url)
        DL->>DL: parseInternalFileUrl - key, context
        alt userId is undefined
            DL-->>Caller: throw Access denied
        end
        DL->>Auth: verifyFileAccess(key, userId, context)
        alt access denied
            Auth-->>DL: false
            DL-->>Caller: throw insufficient permissions
        end
        Auth-->>DL: true
        DL->>Store: downloadFile(key, context)
        Store-->>DL: Buffer
    else external URL
        DL->>DNS: validateUrlWithDNS(url)
        DNS-->>DL: isValid + resolvedIP
        DL->>DNS: secureFetchWithPinnedIP(url, resolvedIP)
        DNS-->>DL: Response
        DL->>DL: readResponseToBufferWithLimit
    end

    DL-->>Caller: Buffer
Loading

Reviews (1): Last reviewed commit: "fix(uploads): authorize internal file UR..." | Re-trigger Greptile

Comment thread apps/sim/lib/uploads/utils/file-utils.server.ts
Comment thread apps/sim/lib/uploads/utils/file-utils.server.ts
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

bugbot run

Added a logger.warn on both internal-download denial paths (no userId / access check failed) for rollout telemetry. Please re-review the latest commit.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

1 issue from previous review remains unresolved.

Fix All in Cursor

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 41ae711. Configure here.

Cursor Bugbot flagged a context-spoofing bypass: downloadFileFromUrl
resolved context via parseInternalFileUrl, which honors a caller-controlled
?context= query param. An attacker could label a private storage key with a
world-readable context (profile-pictures/og-images/workspace-logos) so
verifyFileAccess short-circuits to granted while downloadFile still reads the
private object.

Infer context from the key only (inferContextFromKey), mirroring how
/api/files/serve resolves it; ignore the query param. Also move the userId
guard ahead of key resolution so auth failures surface first.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

bugbot run

Latest commits address the context-spoofing finding (context now derived from key only) and move rationale into TSDoc. Please re-review HEAD.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit a4bba94. Configure here.

isInternalFileUrl matched the /api/files/serve/ substring anywhere in the
string, so a crafted URL could carry it in a query string or fragment and
skip DNS/SSRF validation. Match it in the path component only.

The raw path is checked without URL normalization on purpose: the files
parse route relies on traversal sequences surviving this check (an absolute
https://host/api/files/serve/../.. URL must classify as internal so the '..'
check rejects it, rather than being normalized to /etc/... and waved through
as external). Host is intentionally not gated — cross-tenant reads are
prevented at the storage sink by verifyFileAccess, and host-allowlisting
would break self-hosted/multi-domain deployments. Adds unit tests.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

bugbot run

Folded in the isInternalFileUrl hardening (Greptile P2): the marker is now matched in the URL path only, closing the query-string/fragment bypass. Kept it non-normalizing and host-agnostic on purpose so path-traversal protection in the parse route still applies and self-hosted/multi-domain deploys aren't broken; cross-tenant reads remain gated by verifyFileAccess at the sink. Added unit tests.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 9f47c14. Configure here.

@icecrasher321

Copy link
Copy Markdown
Collaborator

bugbot run

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit bb1d384. Configure here.

@icecrasher321 icecrasher321 merged commit 4f4ff53 into staging Jun 15, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants