Skip to content

feat(rest): add GET /api/v1/page/_render-sources/{uri} to locate a page's render sources#36102

Open
fmontes wants to merge 9 commits into
mainfrom
fmontes/mcp-render-source
Open

feat(rest): add GET /api/v1/page/_render-sources/{uri} to locate a page's render sources#36102
fmontes wants to merge 9 commits into
mainfrom
fmontes/mcp-render-source

Conversation

@fmontes

@fmontes fmontes commented Jun 10, 2026

Copy link
Copy Markdown
Member

What

Adds a read-only REST endpoint that maps a rendered page to references (path + identifier) of all the source files that produce it — theme VTLs, containers (DB & file-based), widgets, and URL-mapped content — in a single call.

This is the LOCATE step of the agent loop (SCAN → LOCATE (this) → READ/WRITE (#35928) → RE-SCAN). Today an agent fixing a page has to guess across folder/theme/container endpoints; this endpoint hands it the exact source references.

GET /api/v1/page/_render-sources/{uri}

Response

References only — no file content, no container code. Clients read the actual code via the existing endpoints named in the OpenAPI description.

{
  "page":  { "identifier": "a9f3…", "uri": "//demo.dotcms.com/index", "languageId": 1 },
  "theme": { "id": "d7b0…", "name": "travel", "folderPath": "//…/travel/",
             "vtls": [ { "path": "//…/header.vtl", "identifier": "a56e…" } ] },
  "containers": {
    "//…/application/containers/default/": {
      "source": "FILE",
      "contentTypes": [ { "contentTypeVar": "Activity",
                          "path": "//…/activity.vtl", "identifier": "313…" } ]
    }
  },
  "widgets": [
    { "contentTypeVar": "SimpleWidget", "title": "Booking Date Selector",
      "contentletId": "", "contentletInode": "", "source": "CODE" },
    { "contentTypeVar": "VtlInclude", "title": "Recommended Events",
      "contentletId": "", "contentletInode": "", "source": "FILE",
      "path": "//…/recommended-events.vtl", "identifier": "41b9…" }
  ],
  "urlContentMap": { "contentTypeVar": "Blog", "title": "",
                     "contentletId": "", "contentletInode": "" }
}
  • containers — a map keyed by identifier (DB) or path (FILE). contentTypes is filtered to types actually placed on the page, resolved from MultiTree under the applied persona + variant (not a naive container list). FILE entries carry the per-type VTL path/identifier; the case-insensitive match fixes a defect where file containers dropped placed types whose VTL filename casing differed from the content-type variable.
  • widgets — every widget carries contentletId + version-aware contentletInode, plus a source: FILE (resolves to a VTL file) or CODE (inline widgetCode; read via /api/v1/contenttype + /api/v1/content).
  • urlContentMap — present only for URL-mapped pages (e.g. /blog/post/{slug}), resolved through the same flow /page/render uses; page.uri keeps the requested URL while page.identifier is the detail page.

Params & host resolution

uri (path segment), host_id, language_id, persona_id, variantName, mode.

Host resolution: host_id query param → default host.

Deviation from the original issue: the //host/... qualified-path form in the acceptance criteria is not supported — dotCMS's NormalizationFilter rejects any URI containing // ("Invalid URI passed") before it reaches the resource, and Tomcat blocks encoded %2F%2F at the connector. Non-default hosts are addressed via host_id. The "missing path → 400" criterion likewise becomes a JAX-RS 404 on a bare /_render-sources (mirrors /page/render).

Reuse (no duplicated logic)

Projects existing PageView / ContainerRaw / FileAssetContainer.getContainerStructuresAssets() / MultiTreeAPI, and routes page resolution through HTMLPageAssetRenderedAPI so URL-map handling matches /page/render. No rendering logic is reimplemented.

Testing

  • Verified manually against a running instance (demo starter): DB/FILE container shapes, on-page filtering, host_id resolution, URL-mapped blog pages (incl. EDIT_MODE), error statuses.
  • Integration tests (PageRenderSourcesResourceTest): DB vs FILE shapes, on-page under persona/variant, host_id resolves a non-default site, URL-mapped detail page + urlContentMap, error statuses, CODE vs FILE widget shapes.
  • Postman happy + error paths.
  • openapi.yaml regenerated from annotations.

Fixes #36082

🤖 Generated with Claude Code

…ge's render sources

Adds a read-only endpoint that maps a rendered page to references (path +
identifier) of its source files, so an agent fixing a page can locate the
theme VTLs, containers, and widgets that produce it in a single call instead
of guessing across folder/theme/container endpoints.

Returns references only — no file content or container code. Clients read the
actual code via the existing endpoints named in the OpenAPI description
(/api/v1/folder, /api/v1/containers/working, /api/v2/assets, /api/v1/contenttype,
/api/v1/content).

Response:
- page:   { identifier, uri (host-qualified), languageId }
- theme:  { id, name, folderPath, vtls[] }
- containers: map keyed by identifier (DB) or path (FILE); value is
  { source, contentTypes[] }, filtered to content types actually placed on the
  page (resolved from MultiTree under the applied persona + variant). FILE
  entries add per-type { path, identifier }.
- widgets: [{ contentTypeVar, title, contentletId, contentletInode, source
  (FILE|CODE), path?, identifier? }] — inode resolved version-aware.
- urlContentMap: present only for URL-mapped pages (e.g. /blog/post/{slug}),
  resolved through the same flow /page/render uses.

Params: uri (path segment), host_id, language_id, persona_id, variantName, mode.
Host resolution: host_id query param -> default host. The //host qualified-path
form is not supported because dotCMS's NormalizationFilter rejects // URIs
before they reach the resource; host_id is the supported mechanism.

Reuses PageView/ContainerRaw/FileAssetContainer.getContainerStructuresAssets()
and MultiTreeAPI — no duplicated rendering logic.

Includes integration tests (DB/FILE container shapes, on-page filtering under
persona/variant, host_id resolution, URL-mapped pages, error statuses),
Postman coverage, and regenerated openapi.yaml.

Fixes #36082

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

❌ Linked Issue Needs Team Label

This PR is linked to issue #36082, but that issue has no Team : * label. Every linked issue must be owned by a team for tracking and triage.

How to fix this:

Apply a Team : * label to the linked issue (e.g., Team : Scout, Team : Platform, Team : Falcon, Team : Maintenance). Then push a new commit or edit the PR description to re-run this check.


This comment was automatically generated by the issue linking workflow

@github-actions github-actions Bot added the Area : Backend PR changes Java/Maven backend code label Jun 10, 2026
@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Claude finished @fmontes's task in 3m 23s —— View job


PR Review: feat(rest): add GET /api/v1/page/_render-sources/{uri}

  • Gather context and read key files
  • Review endpoint logic and security
  • Review DTOs and OpenAPI spec
  • Review integration tests
  • Post findings

Overall this is solid work. The prior review rounds caught the main issues and the fixes look correct. A few remaining concerns:

Security

buildOnPageKeys leaks content type via permission-less lookup (PageResourceHelper.java:1192):
findContentletByIdentifierAnyLanguage is called for every MultiTree entry without a permission check. The result is used only to derive a containerId|contentTypeVar key (not to expose the contentlet itself), so actual data isn't leaked — but the pattern is inconsistent with the permission model applied in buildWidgetViews and buildUrlContentMapView. If a placed contentlet is permission-restricted, the caller can infer it exists and what content type it uses. Low risk, but worth flagging for consistency.

resolveWidgetFileRef exposes VTL path+identifier without permission check (PageResourceHelper.java:1457-1464):**
findContentletByIdentifierAnyLanguage(fileIdentifier) is called to resolve the file asset backing a widget's FILE field. There is no permission gate on fileCon before returning the VTL path and identifier. If a widget's backing VTL file is permission-restricted, the caller gets its path and identifier anyway. Consistent treatment would be to call canReadContentlet(fileCon, user) before returning the VtlFileRefView.

Fix this →

Logic

BinaryField handling is silently broken (PageResourceHelper.java:1452-1453):
The code iterates FileField and BinaryField and calls val.toString() on the value, treating it as a content identifier. For BinaryField, the stored value is a java.io.FiletoString() yields an absolute filesystem path, not a content identifier. findContentletByIdentifierAnyLanguage called with a filesystem path returns null, so firstAsset is never set from a BinaryField. BinaryField-backed widgets always fall back to CODE source silently. This is a pre-existing issue with how widget file fields are modeled, but the code comment ("BinaryField stores the actual file") acknowledges the distinction without handling it. If BinaryField widgets are a real use case for FILE-source resolution, this should either be fixed or explicitly documented as unsupported.

URI construction in buildSourceLookupRequest (PageResourceHelper.java:1144) assumes uri has no query string:

final String uriWithHostParam = uri + "?host_id=" + host.getIdentifier();

If uri ever contains a ? (e.g. from a caller passing query params in a path segment), this produces a malformed URL with two ?. In practice JAX-RS strips query params from @PathParam values so this is safe today, but it's fragile. Using uri.contains("?") ? uri + "&host_id=" : uri + "?host_id=" would be more defensive.

Performance

N+1 pattern in buildOnPageKeys (PageResourceHelper.java:1192):
Each MultiTree entry triggers a separate findContentletByIdentifierAnyLanguage call to determine the content type variable. For a page with many placed contentlets this is N round-trips. The MultiTree itself only carries the contentlet identifier — the content type variable requires loading the full contentlet. No cached batch API is obvious here, but it's worth noting for pages with heavy content trees.

Minor

  • The endpoint does not set requiredBackendUser(true) (line 299-302). Authenticated frontend users can call it. The page READ permission check inside getRenderSources is the authorization gate. This is reasonable for a read-only source-reference endpoint, but it's worth confirming intentional — most non-trivial page endpoints in this file require backend login.

- Remove false //hostname/path support claim from docs (host_id only)
- Fetch page MultiTrees once and share across onPage-keys + widget passes
- Walk theme folder recursively so sub-folder VTLs are no longer missed
- Log MultiTree load failure at error level instead of swallowing it
- Prefer the .vtl asset when a widget declares multiple file fields
- Reject unknown mode values with 400 instead of silent PREVIEW_MODE fallback

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fmontes

fmontes commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review — addressed all six in 22891b9.

  1. Done — removed the //hostname/path claim from the Javadoc, @param, and @Operation description; host_id (or default host) is now the only documented host-targeting mechanism, and the OpenAPI spec was regenerated to match.
  2. DonegetMultiTreesByPersonalizedPage is now called once in getRenderSources and the resulting list is passed into both buildOnPageKeys and buildWidgetViews.
  3. DonebuildThemeView now walks the theme folder with findSubFoldersRecursively, so VTLs in sub-folders (navigation/, header/, etc.) are included instead of only the root level.
  4. Done — the MultiTree fetch failure is now logged at error level with the page id (instead of being swallowed), so an empty result from a DB failure is distinguishable from a genuinely empty page. The endpoint stays best-effort.
  5. DoneresolveWidgetFileRef no longer trusts field order: it prefers the first field whose asset is a .vtl, and only falls back to the first file asset when no field resolves to a .vtl.
  6. Done — an unrecognized mode value now returns 400 Bad Request instead of silently falling back to PREVIEW_MODE.

The "Setup: Get default site for render-sources" request inherited the
collection-level bearer auth, whose {{jwt}} is minted asynchronously by the
collection pre-request (pm.sendRequest without await). As the first request in
its folder to need auth, it hit the jwt-generation race and returned 401.

Give that one setup request explicit Basic auth using the same {{user}}/{{password}}
env vars the pre-request itself uses, removing the jwt dependency for the call
that seeds renderSourcesHostId. The remaining render-sources requests keep the
inherited bearer auth (jwt is warm by the time they run).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…race

The render-sources "Setup" request was returning 401: the collection
pre-request mints {{jwt}} via an un-awaited pm.sendRequest, and when this
folder runs that jwt slot is occasionally empty. The pre-request then re-mints
via POST /api/v1/apitoken, which itself intermittently 401s — leaving every
auth path (bearer or basic) for this one request unauthenticated.

Resolve renderSourcesHostId in a pre-request script instead: reuse the
{{defaultSiteId}} collection variable already set by an earlier folder
(Number of Content References in Pages, folder 10, runs before this folder 20),
falling back to a synchronous Basic-auth site lookup only if it is absent. The
test now asserts the resolved id rather than a live response status, so it no
longer depends on the shared jwt timing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause confirmed from the dotCMS security log: a sibling folder runs a
Logout immediately before the render-sources folder, invalidating the shared
session, and the collection {{jwt}} is minted by an un-awaited pm.sendRequest.
The result is an empty bearer ("JWT ... must contain exactly 2 period
characters. Found: 0") so every render-sources request returned 401.

Make the folder self-contained: a folder-level pre-request mints a dedicated
{{renderSourcesJwt}} synchronously via POST /api/v1/apitoken with Basic auth
(awaited in the callback), and the folder sets bearer auth to that token so all
child requests authenticate regardless of the shared session/jwt state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Real root cause (from the dotCMS security log): the render-sources folder was
the last in the collection, running AFTER the 'invalidateSession' folder logged
the shared session out — every request then 401'd ("JWT ... Found: 0 period
characters"), and even a standalone Basic-auth apitoken mint is rejected in this
environment, so no in-folder token workaround could succeed.

Move the folder to before 'invalidateSession' so it inherits the live
collection bearer like every other page-test folder. Revert the earlier
self-token/basic-auth workarounds back to plain inherited auth and drop the now
unused renderSourcesHost / renderSourcesJwt collection variables.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With auth fixed by the earlier folder move, the render-sources happy-path then
404'd: it requested /index against whatever /api/v1/site/defaultSite returned,
but (a) /index is not guaranteed to exist and (b) the live default site can be
switched by sibling tests, so it no longer matched the host where the suite's
test pages live.

Target {{firstPageUrl}} — the page the suite creates earlier in the run — on
{{defaultSiteId}} (the host it was created/rendered on), instead of assuming
/index on a re-queried default site. Make the URL-mapped test tolerate absent
starter blog content (200-with-validation or 404).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Postman _render-sources folder:
- Move host-alias logic to a folder pre-request and drop the vestigial
  "Setup" item whose HTTP response was never asserted.
- Merge the two near-identical happy-path items into one.
- Remove a stale //host/uri comment left over from an earlier iteration.

PageRenderSourcesResourceTest:
- Make test_theme_block assert the theme unconditionally (the setup creates a
  real theme) instead of guarding with an if + Logger.info pseudo-assertion;
  drop the now-unused Logger import.
- Reference Source / WidgetSourceView.Source enum names instead of bare
  "DB"/"FILE"/"CODE" string literals so the tests track the response contract.
- Delete the tombstone comment about removed tests 13-15.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new read-only REST endpoint GET /api/v1/page/_render-sources/{uri} that returns references (paths/identifiers) to the assets involved in rendering a page (theme VTLs, containers, widgets, and URL-mapped content), intended to support the “LOCATE” step of the agent workflow.

Changes:

  • Introduces the /v1/page/_render-sources/{uri} endpoint and supporting helper logic to resolve page/theme/container/widget/url-map references.
  • Adds new response/view DTOs for render-source references and updates OpenAPI to document the endpoint and schemas.
  • Adds integration and Postman coverage for happy/error paths (DB vs FILE containers, persona/variant filtering, URL-map behavior).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
dotCMS/src/main/webapp/WEB-INF/openapi/openapi.yaml Documents the new endpoint and adds schemas for the render-sources response.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/PageResource.java Adds the JAX-RS endpoint method and request parameter handling.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/PageResourceHelper.java Implements core resolution logic for theme/containers/widgets/url-map and builds the response view.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/PageRenderSourcesView.java Top-level response DTO for render sources.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/PageSourceRefView.java DTO for the resolved page reference (identifier/uri/languageId).
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/ThemeSourceView.java DTO for theme folder reference and VTL list.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/VtlFileRefView.java DTO for VTL file reference (path/identifier).
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/ContainerSourceView.java DTO for container “source” discriminator and filtered content-type entries.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/ContentTypeEntryView.java DTO for per-content-type entries within containers (optionally including VTL refs).
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/WidgetSourceView.java DTO for widget placements including FILE vs CODE source discriminator.
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/UrlContentMapView.java DTO for URL-mapped contentlet reference (contentletId/inode/type/title).
dotCMS/src/main/java/com/dotcms/rest/api/v1/page/ResponseEntityPageRenderSourcesView.java Wraps the response in the standard ResponseEntityView envelope.
dotcms-postman/src/main/resources/postman/PagesResourceTests.json Adds Postman requests/tests for render-sources endpoint (happy + error + URL-map optional).
dotcms-integration/src/test/java/com/dotcms/rest/api/v1/page/PageRenderSourcesResourceTest.java Adds integration tests for shapes, filtering, host resolution, widgets, and URL-map behavior.

Comment on lines +950 to +952
* @param path Required. Qualified ({@code //host/uri}) or plain ({@code /uri}) page path.
* @param hostId Optional. Explicit host identifier (ignored when {@code path} is qualified).
* @param languageId Optional. Language identifier; defaults to the default language.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3f82b84 — corrected the getRenderSources helper Javadoc: path is a plain /uri, the //host/uri form is not supported (NormalizationFilter rejects //), and hostId defaults to the default host.

Comment thread dotCMS/src/main/webapp/WEB-INF/openapi/openapi.yaml
Comment thread dotCMS/src/main/java/com/dotcms/rest/api/v1/page/ThemeSourceView.java Outdated
Comment thread dotCMS/src/main/webapp/WEB-INF/openapi/openapi.yaml
Comment thread dotCMS/src/main/webapp/WEB-INF/openapi/openapi.yaml
Security:
- buildWidgetViews and buildUrlContentMapView re-check READ permission on the
  contentlet resolved via the user-less findContentletByIdentifierAnyLanguage
  fallback, so identifiers/inodes of content the caller cannot read are not
  exposed. urlContentMap now omits inode/title when the version is unreadable.

Docs/contract:
- Correct getRenderSources helper Javadoc: path is a plain /uri (the //host/uri
  form is unsupported); hostId defaults to the default host.
- ThemeSourceView.vtls description now reflects the recursive subfolder search.
- Document the 400 response (invalid mode) in @ApiResponses / OpenAPI.
- Fix endpoint path in PageRenderSourcesView / ResponseEntityPageRenderSourcesView
  Javadoc (/_render-sources/{uri}).
- Regenerate openapi.yaml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

feat(rest): add /api/v1/page/_render-sources endpoint to locate a page's render sources

2 participants