Skip to content

fix(opensearch): grant cluster:monitor/main to limited role for OS 3.x startup (#35922)#36154

Open
fabrizzio-dotCMS wants to merge 1 commit into
mainfrom
issue-35922-grant-cluster-monitor-main-os3-limited-role
Open

fix(opensearch): grant cluster:monitor/main to limited role for OS 3.x startup (#35922)#36154
fabrizzio-dotCMS wants to merge 1 commit into
mainfrom
issue-35922-grant-cluster-monitor-main-os3-limited-role

Conversation

@fabrizzio-dotCMS

Copy link
Copy Markdown
Member

What

Adds cluster:monitor/main to the limited OpenSearch role's cluster action group in opensearch.py (the provisioning script for dotcms-es-userdotcms-role).

Why (spike #35922)

Spike #35922 validated dotCMS's non-admin OS user against OpenSearch 3.4.0 using the single-node-os-migration stack. Finding:

  • The role's indices_all action group already expands to the full admin + read + write action set on cluster_<customer>* indices on both OS 1.3 and OS 3.x — content write/read, mapping, scroll, forcemerge, cache-clear, delete-by-query all work. These were not real gaps.
  • The only missing permission is cluster:monitor/main (GET /), required by IndexStartupValidator.validateOSVersion() which calls client.info() to read the OS version at startup.
  • Without it, the limited user gets a 403 security_exception, the validator's single catch block misclassifies a healthy OS 3.x cluster as "not reachable", and dotCMS silently halts the migration and falls back to ES-only (phase reset 1 → 0).
  • This is not an OS 3.x permission change — GET / returns 403 for the limited user on OS 1.3 too. It is a new dotCMS-side dependency introduced by the migration code.

The change

self.cluster_permissions = [
    "cluster:monitor/health",
    "indices:data/write/bulk",
    "cluster:monitor/state",
    "cluster:monitor/nodes/stats",
    "indices:data/read/scroll",
    "indices:data/read/scroll/clear",
    "cluster:monitor/main",   # <-- added: GET / for IndexStartupValidator
]

Verification (end-to-end)

  1. Re-provisioned OS 3.x with the patched script → GET / as the limited user now returns 200.
  2. Started dotCMS with DOT_FEATURE_FLAG_OPEN_SEARCH_PHASE=3 (OS-only):
    • IndexStartupValidator - OS version check passed: 3.4.0
    • Endpoint separation check passed
    • OS-only index bootstrap (no ES index created).
  3. REST smoke against the OS 3.x index:
    • POST /api/content/_search (+live:true) → 200, results served from OS 3.x.
    • POST /api/v1/workflow/actions/default/fire/PUBLISH → content published; OS 3.x working index doc count 5 → 6.
    • POST /api/content/_search (+title:Smoke*) → finds the new content (read from OS 3.x).

Rollout note

This role is provisioned per-customer. The script change covers new provisioning automatically; existing customer roles must be re-provisioned (or have the action group updated) before they advance past Phase 0, or their startup validator will fail against OS 3.x.

Follow-up (out of scope here)

Harden IndexStartupValidator.validateOSVersion() so a 403/security_exception is distinguished from a genuine connectivity failure and logs the missing action name, instead of the opaque "not reachable" (tracked separately).

Full spike report and methodology: see issue #35922.

🤖 Generated with Claude Code

…x startup (#35922)

Spike #35922 validated dotCMS's non-admin OS user (dotcms-es-user ->
dotcms-role, provisioned by opensearch.py) against OpenSearch 3.4.0.

The role's indices_all action group already expands to the full
admin+read+write action set on cluster_<customer>* indices on both OS 1.3
and OS 3.x, so content write/read, mapping, scroll and admin ops all work.

The single gap is cluster:monitor/main (GET /), which IndexStartupValidator
.validateOSVersion() requires via client.info() to read the OS version at
startup. Without it the limited user gets a 403 security_exception, the
validator misclassifies a healthy OS 3.x cluster as "not reachable", and
dotCMS silently halts the migration and falls back to ES-only.

Adding cluster:monitor/main to the cluster action group closes the gap.
Verified end-to-end: with the patched script the startup validator passes
("OS version check passed: 3.4.0") and dotCMS in PHASE=3 (OS-only) writes
and reads content through the OS 3.x index with the limited user.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Claude finished @fabrizzio-dotCMS's task in 57s —— View job


Rollback Safety Analysis

  • Read rollback-unsafe categories reference
  • Get full PR diff
  • Analyze diff against all unsafe categories
  • Apply label

Result: ✅ Safe to Roll Back

Changed file: docker/docker-compose-examples/single-node-os-migration/opensearch.py

What changed: A single permission string "cluster:monitor/main" was added to the self.cluster_permissions list in the Python OpenSearch provisioning script.

Category analysis:

Category Match? Reason
C-1 Structural Data Model ❌ No No database schema touched
C-2 ES Mapping Change ❌ No No index mapping added, changed, or removed — this is an OS role/ACL permission, not a mapping
C-3 Content JSON Model Version ❌ No No CURRENT_MODEL_VERSION or ImmutableContentlet change
C-4 DROP TABLE/Column ❌ No No DDL
H-1 One-Way Data Migration ❌ No No data transformation
H-2 RENAME TABLE/Column ❌ No No DDL
H-3 PK Restructuring ❌ No No DDL
H-4 New Field Type ❌ No No field type registration
H-5 Storage Provider Change ❌ No No storage config change
H-6 DROP PROCEDURE/FUNCTION ❌ No No stored procedures
H-7 NOT NULL Column ❌ No No DDL
H-8 VTL Viewtool Contract ❌ No No Java viewtool changes
M-1 Column Type Change ❌ No No DDL
M-2 Push Publishing Bundle ❌ No No bundle XML changes
M-3 REST/GraphQL API ❌ No No API contract change
M-4 OSGi Interface ❌ No No OSGi interface change

Why it's safe: This is a purely additive change to a Docker Compose example provisioning script. It adds one cluster permission to an OpenSearch role definition. Rolling back to N-1 means the provisioning script would no longer include cluster:monitor/main; however, already-provisioned roles retain the permission (as the PR notes, existing roles must be re-provisioned independently). The change has no effect on the database, index mappings, Java code, or any API contract. No application state is altered by the script itself — it only affects future role provisioning runs.

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🤖 Codex Review — openai.gpt-5.5

No findings.


Run: #27443750222 · tokens: in: 599 · out: 123 (reasoning: 114) · total: 722

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Spike: Validate dotCMS OpenSearch client requests against a non-admin OS 3.x user

1 participant