feat(tiptap): convert Story Block content to Markdown (#35727)#35728
feat(tiptap): convert Story Block content to Markdown (#35727)#35728wezell wants to merge 4 commits into
Conversation
…35727) Adds TiptapMarkdown (com.dotcms.tiptap) which converts Story Block / ProseMirror documents to markdown and back. Wires it into the existing renderable surface so Velocity can call: $contentlet.storyBlock.toMd $markdownTool.blockToMarkdown(json) Supports paragraph, heading 1-6, blockquote, bullet/ordered lists, codeBlock with language, horizontalRule, hardBreak, image, GFM tables, plus the dotCMS-specific dotImage and youtube extensions. Marks: bold, italic, strike, code, link. Marks with no markdown equivalent (underline, highlight, sub/superscript, textStyle, color) are dropped silently; truly unknown nodes/marks log once at INFO and are skipped so the converter never throws on user-extended Tiptap schemas. Markdown -> Tiptap uses commonmark-java with GFM tables and strikethrough extensions (zero transitive runtime deps). Tests: 56 passing -- 49 synthetic + 7 against a trimmed real-blog fixture (blog-test.json) covering every node and mark plus fixed-point round-trip stability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @wezell's task in 3m 51s —— View job PR Review: feat(tiptap): convert Story Block content to Markdown
The overall design is sound and the test coverage is solid. A few real bugs and several things worth fixing before merge. Bugs1. Dead code:
private final Deque<ListCtx> listStack = new ArrayDeque<>(); // never used
...
private static final class ListCtx { ... } // never instantiatedThe actual list rendering uses 2. Test
// Missing from SUPPORTED_MARK_TYPES:
SUPPORTED_MARK_TYPES.add("highlight");
SUPPORTED_MARK_TYPES.add("subscript");
SUPPORTED_MARK_TYPES.add("superscript");
SUPPORTED_MARK_TYPES.add("textStyle");
SUPPORTED_MARK_TYPES.add("color");3. Image
sb.append(".append(src); // src unescaped
...
final StringBuilder sb = new StringBuilder("](").append(href); // href unescapedA Additionally, 4.
5. Paragraphs starting with
Actually — The fix is to detect Design concerns (non-blocking but worth noting)6.
final StringBuilder saved = new StringBuilder(out); // copies full output so far
out.setLength(0);
// ... render child ...
return out.toString();
// finally: out.append(saved)For a document with 200 list items and an accumulated output of 50 KB, this copies ~50 KB per item. A simple fix is to use an index-based approach: record 7. Double-parse in
CleanThe whitespace-lifting logic (trailing/leading whitespace extracted before/after mark open/close delimiters), the dynamic fence width for triple-backtick bodies, and the mark precedence/reopening logic in |
Mirrors the naming of the existing toHtml() Renderable surface so Velocity callers use `\$contentlet.storyBlock.toMarkdown` alongside `\$contentlet.storyBlock.toHtml`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QA FIndingsTested the usable methods end-to-end:
JSON → Markdown
Markdown → JSON (toTiptap)
Tested in Editor Mode
|
Closes #35727
Summary
com.dotcms.tiptap.TiptapMarkdown— bidirectional converter between Tiptap JSON (Story Block / ProseMirror) and Markdown.$contentlet.storyBlock.toMarkdown()(StoryBlockMap)$markdownTool.blockToMarkdown(json)(MarkdownTool)org.commonmark:commonmark+-ext-gfm-tables+-ext-gfm-strikethrough(0.22.0). Zero transitive runtime deps (~250KB total).What it handles
Nodes: paragraph, heading 1-6, blockquote, bulletList, orderedList, listItem, codeBlock (with language), horizontalRule, hardBreak, image, table/tableRow/tableHeader/tableCell, plus dotCMS-specific
dotImageandyoutube.Marks: bold, italic, strike, code, link.
Graceful degradation: marks with no markdown equivalent (
underline,highlight,subscript,superscript,textStyle,color) are dropped silently. Any other unknown node/mark logs once at INFO viaLogger.infoand is skipped — Tiptap is extensible, so the converter never throws on user-extended schemas.Notable correctness details
*x *is invalid). The serializer extracts trailing whitespace out of mark spans before emitting closers, and leading whitespace before openers, so output is always well-formed and parses back to the same structure.codemarks orcodeBlocknodes is emitted literally — special chars are NOT backslash-escaped.codeBlockwhose body contains triple backticks gets a longer fence (4+ ticks) so the fence can't collide.Test plan
TiptapMarkdownTest— 49 synthetic unit tests covering every supported node, every mark, escaping, fence-width, JSON-string overload, round-trip stability per node type.TiptapMarkdownBlogContentTest— 7 tests againstblog-test.json(trimmed to 2 real Story Block bodies, 122KB), verifying:./mvnw compile -pl :dotcms-core) clean.$contentlet.storyBlock.toMarkdown()in a Velocity template to sanity-check end-to-end wiring.Out of scope (documented)
youtuberenders as a plain link to the video src (markdown has no native embed). Reviewer call: switch to an<iframe>HTML block if richer rendering is wanted.underlineetc. are intentionally lossy on the JSON→MD direction since markdown lacks the syntax.🤖 Generated with Claude Code