fix(opensearch): grant cluster:monitor/main to limited role for OS 3.x startup (#35922)#36154
Conversation
…x startup (#35922) Spike #35922 validated dotCMS's non-admin OS user (dotcms-es-user -> dotcms-role, provisioned by opensearch.py) against OpenSearch 3.4.0. The role's indices_all action group already expands to the full admin+read+write action set on cluster_<customer>* indices on both OS 1.3 and OS 3.x, so content write/read, mapping, scroll and admin ops all work. The single gap is cluster:monitor/main (GET /), which IndexStartupValidator .validateOSVersion() requires via client.info() to read the OS version at startup. Without it the limited user gets a 403 security_exception, the validator misclassifies a healthy OS 3.x cluster as "not reachable", and dotCMS silently halts the migration and falls back to ES-only. Adding cluster:monitor/main to the cluster action group closes the gap. Verified end-to-end: with the patched script the startup validator passes ("OS version check passed: 3.4.0") and dotCMS in PHASE=3 (OS-only) writes and reads content through the OS 3.x index with the limited user. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @fabrizzio-dotCMS's task in 57s —— View job Rollback Safety Analysis
Result: ✅ Safe to Roll BackChanged file: What changed: A single permission string Category analysis:
Why it's safe: This is a purely additive change to a Docker Compose example provisioning script. It adds one cluster permission to an OpenSearch role definition. Rolling back to N-1 means the provisioning script would no longer include |
🤖 Codex Review —
|
What
Adds
cluster:monitor/mainto the limited OpenSearch role's cluster action group inopensearch.py(the provisioning script fordotcms-es-user→dotcms-role).Why (spike #35922)
Spike #35922 validated dotCMS's non-admin OS user against OpenSearch 3.4.0 using the
single-node-os-migrationstack. Finding:indices_allaction group already expands to the full admin + read + write action set oncluster_<customer>*indices on both OS 1.3 and OS 3.x — content write/read, mapping, scroll, forcemerge, cache-clear, delete-by-query all work. These were not real gaps.cluster:monitor/main(GET /), required byIndexStartupValidator.validateOSVersion()which callsclient.info()to read the OS version at startup.403 security_exception, the validator's singlecatchblock misclassifies a healthy OS 3.x cluster as "not reachable", and dotCMS silently halts the migration and falls back to ES-only (phase reset 1 → 0).GET /returns403for the limited user on OS 1.3 too. It is a new dotCMS-side dependency introduced by the migration code.The change
Verification (end-to-end)
GET /as the limited user now returns 200.DOT_FEATURE_FLAG_OPEN_SEARCH_PHASE=3(OS-only):IndexStartupValidator - OS version check passed: 3.4.0Endpoint separation check passedPOST /api/content/_search(+live:true) →200, results served from OS 3.x.POST /api/v1/workflow/actions/default/fire/PUBLISH→ content published; OS 3.x working index doc count5 → 6.POST /api/content/_search(+title:Smoke*) → finds the new content (read from OS 3.x).Rollout note
This role is provisioned per-customer. The script change covers new provisioning automatically; existing customer roles must be re-provisioned (or have the action group updated) before they advance past Phase 0, or their startup validator will fail against OS 3.x.
Follow-up (out of scope here)
Harden
IndexStartupValidator.validateOSVersion()so a403/security_exceptionis distinguished from a genuine connectivity failure and logs the missing action name, instead of the opaque "not reachable" (tracked separately).Full spike report and methodology: see issue #35922.
🤖 Generated with Claude Code