perf(relationships): bulk tree persistence, lean hydration and delta-only reindex when saving related content#36149
Conversation
…ent-performance-improvements
…ent-performance-improvements
…ent-performance-improvements
…ent-performance-improvements
|
Claude finished @jcastro-dotcms's task in 2m 23s —— View job Rollback Safety Analysis
Verdict: ✅ Safe To RollbackAfter reviewing the full diff against every category in the rollback-unsafe reference, this PR does not match any unsafe category. Here is the analysis for each category checked:
Summary: All changes are confined to SQL query logic, batch operation helpers, and internal method additions. No database schema changes, no Elasticsearch mapping changes, no data transformations, and no API contract breaks. N-1 can start and operate normally after this code is deployed (or rolled back). Label applied: |
NOTE TO DEVSThis new PR was created to make the branch match a GitHub ticket number. The initial PR containing the original feedback form Claude Code can be found in the previous version, localed here: Just to be clear, this new PR contains the exact same code changes from the previous one. |
❌ Codex Review failed —
|
|
Semgrep found 15
The method identified is susceptible to injection. The input should be validated and properly If this is a critical or high severity finding, please also link this issue in the #security channel in Slack. |
This PR fixes: #36147
🎯 Problem
Saving a contentlet whose Content Type has several MANY_TO_MANY relationship fields became unusably slow as related-content volume grew: on a real-world dataset (~1,500 parents × ~20 children each, so each child relates back to ~350 parents), a single Save/Publish took from 40 seconds up to several minutes and fired tens of thousands of JDBC queries. Three compounding root causes:
getTree+saveTreepair (2–3 queries each) plus oneDELETEper row.dependenciesLeftToReindex()re-queued every currently-related contentlet for reindexing on every save, because the old-state/new-state comparison read the new DB state on both sides. Saving one child re-indexed hundreds of unchanged parents, each fully hydrated and remapped.✨ Proposed Changes
Relationship persistence — bulk, batched, and permission-aware (
ESContentletAPIImpl,TreeFactory)deleteRelatedContentnow splits into two paths: a fast identifier-only path for the system user / CMS Admins (single bulkDELETE, zero hydration, only removed parents loaded for reindex), and a permission-scoped path for limited users that preserves the legacy contract — relationships to content the user cannot READ are never deleted (bulk delete when the filter removes nothing; chunkedINdeletes otherwise).relateContentcollects all rows and persists them through a newTreeFactory.insertTrees: a duplicate-safe batched upsert (dedup by the(child, parent, relation_type)PK, batchedDELETE+INSERTwrapped inLocalTransactionso it is atomic for any caller). Duplicate records in the payload (same identifier twice, multiple languages) no longer risk a PK violation.MAX(tree_order)+1lookup over surviving rows (replacing a fullgetRelatedContentFromIndexhydration), queried only on the parent branch where it is actually used.Hydration elimination (
ESContentletAPIImpl,VersionableFactory/Impl)treetable via newTreeFactoryidentifier lookups — no contentlet hydration to compute delete scopes or counts.VersionableFactory.findAllContentletVersionInfos(Collection)(chunkedIN, 500 ids/statement) + the existing batchfindContentlets, instead of one lookup per identifier. All SQL lives in factory classes per house architecture.Delta-only dependency reindexing (
ESMappingAPIImpl)dependenciesLeftToReindex()now compares the search index (pre-save state) against thetreetable (post-save state) and returns their true symmetric difference: only relationships actually added or removed. Unchanged parents are never re-queued — this is the biggest win for heavily related datasets.Hardening (review feedback)
assertTreeColumnallowlist guard on everyTreeFactoryhelper that interpolates a column name into SQL (onlyparent/childaccepted) — enforces what the Semgrep findings flagged conventionally.TreeFactory, contracts on the bulk deletes (caller-enforced permissions), and a full explanation of the new reindex-delta model ondependenciesLeftToReindex.✅ Checklist
Security notes: permission semantics are preserved and now covered by an integration test (a limited user's save cannot delete relationships to content they cannot READ — this also fixes a data-loss risk present in the first draft of this PR). The Semgrep
CUSTOM_INJECTION-2findings are false positives (concatenated fragments are compile-time literals or generated?,?,...placeholders; all user data is bound parameters) and are additionally guarded by the new column-name allowlist.🧪 Testing
11 new integration tests, all green:
TreeFactoryTest(new class, 7 tests): batch upsert dedup, pre-existing row replacement, scoped deletes, >500-id chunking, ordered id lookups, next-tree-order.ESContentletAPIImplTest(+3): limited-user permission preservation, duplicate-records save, append-after-preserved ordering. Full class: 90/90 passing.ESMappingAPITest(+2): reindex delta returns only added+removed parents; multilingual parents are not spuriously re-indexed.ℹ️ Additional Info
-source 11), including the test suite.treerows (pointing to fully-deleted content) are now cleaned up by the bulk delete;tree_ordervalues are rebased per save (relative ordering preserved — consumers onlyORDER BY tree_order).Screenshots
N/A — backend-only performance change.
🤖 Generated with Claude Code