Skip to content

Redesign migration API around Conversion/Parse result types#30

Draft
gschlager wants to merge 21 commits intomainfrom
claude/refine-local-plan-dMDie
Draft

Redesign migration API around Conversion/Parse result types#30
gschlager wants to merge 21 commits intomainfrom
claude/refine-local-plan-dMDie

Conversation

@gschlager
Copy link
Copy Markdown
Member

@gschlager gschlager commented May 5, 2026

Summary

This PR reshapes the Markbridge public API to return structured result objects (Parse and Conversion) instead of raw strings/ASTs, and consolidates render-side customization into a single renderer: parameter. The changes eliminate the global configuration singleton and per-process default registries in favor of explicit, reusable renderer instances. Two follow-up rounds added AST-mutation hooks (block form on *_to_markdown, polymorphic Markbridge.render) and selective escaper customisation (allow: kwarg + IdentityEscaper) that replace the subclass-and-override pattern from importers like Liferay/Wolfram.

Key Changes

  • New result types: Introduced Parse (for parse-only calls) and Conversion (for render calls) — Data.define value objects wrapping the AST, markdown output, format, unknown tags, diagnostics, emissions, and errors

    • parse_bbcode, parse_html, parse_text_formatter_xml, parse_mediawiki now return Parse objects
    • bbcode_to_markdown, html_to_markdown, text_formatter_xml_to_markdown, mediawiki_to_markdown now return Conversion objects
    • Conversion#to_s delegates to markdown for string-coercion contexts (interpolation, puts)
  • Removed global configuration: Deleted Markbridge::Configuration class and all singleton methods

    • Removed: Markbridge.configuration, Markbridge.configure, Markbridge.reset_defaults!
    • Removed: Markbridge.default_handlers, Markbridge.default_html_handlers, Markbridge.default_text_formatter_handlers, Markbridge.default_tag_library
  • Renderer factory: Added Markbridge.discourse_renderer(...) to build reusable Renderer instances with customizations

    • Accepts: tags:, tag_library:, unregister:, escaper:, escape:, escape_hard_line_breaks:, allow:, postprocessor:
    • Replaces per-call tag_library:, escaper:, escape_hard_line_breaks: parameters
  • Simplified method signatures: All *_to_markdown methods and convert now accept only:

    • handlers: — parser handler registry (optional, uses default)
    • renderer: — pre-built Renderer (optional, creates fresh default)
    • raise_on_error: — boolean (default true)
    • An optional block yielding the parsed AST between parse and render (mutate before rendering — e.g. append orphan attachments)
  • Markbridge.render accepts a Parse or an AST::Node: when given a Parse, the resulting Conversion carries the parser's unknown_tags, diagnostics, and source format forward; when given an AST node those fields default to empty / :discourse (current behavior). Useful for the parse → mutate → render flow.

  • Handler registry enhancements: Added #overlay method to BBCode, HTML, and TextFormatter registries for wrapping existing handlers

    • Yields the previously-bound handler (or nil) so a custom handler can delegate to the default
  • TagLibrary#unregister: New method to remove tag bindings, enabling auto-passthrough for unregistered AST classes

  • Postprocessor: New Markbridge::Renderers::Discourse::Postprocessor class that cleans up rendered markdown (collapses excess newlines, strips whitespace-only lines, trims document edges)

  • Emissions tracking: Renderer now tracks side-channel emissions via interface.emit(:type, data) and exposes them via #emissions and Conversion#emissions

  • MarkdownEscaper#allow: for selective passthrough: replaces the subclass-and-override pattern. Recognised keys: :bullet_list, :ordered_list, :atx_heading, :block_quote. Alias :lists expands to [:bullet_list, :ordered_list]. Unknown keys raise ArgumentError. Marker bytes pass through verbatim while inline content after the marker is still escaped.

  • IdentityEscaper + escape: false sugar: new no-op escaper class for migration paths where source content is already trusted Markdown. Markbridge.discourse_renderer(escape: false) swaps in IdentityEscaper. Mutually exclusive with escape_hard_line_breaks: / allow:; an explicit escaper: always wins.

  • MediaWiki API consistency: Renamed inline_tag_registry: parameter to handlers: for parity with other parsers

  • TextFormatter handler signature: All TextFormatter handlers now accept processor: parameter for consistency

Notable Implementation Details

  • Parse and Conversion use Data.define (frozen, structural equality)
  • Renderer instances are designed to be built once and reused across thousands of posts (zero overhead on no-emit path)
  • The raise_on_error: false mode allows per-row failure isolation in batch migrations
  • Auto-passthrough for unregistered AST classes eliminates boilerplate "passthrough" Tags
  • Emissions are scoped per top-level render call and reset between calls; with_provisional_emissions { } snapshots the buffer for TableTag's throwaway Markdown-vs-HTML pass
  • MarkdownEscaper#allow: storage is a private Array (≤ 4 elements) — #include? is observably identical to Set#include? at this size, no allocation
  • Comprehensive example (examples/forum_migration.rb) demonstrates the migration API end-to-end: overlay, emit, tags:, unregister:, allow: :lists, the AST-mutation block, Conversion#emissions, raise_on_error: false, and the convert(format:) dispatcher
  • UPGRADING.md guide covers every breaking change

Tests

  • Updated all system and unit tests to work with new result types
  • Added new tests for Conversion#emissions, Renderer#emissions, #overlay, #unregister, Postprocessor, MarkdownEscaper#allow:, IdentityEscaper, the polymorphic Markbridge.render(Parse|AST), and the AST-mutation block form
  • Removed tests for deleted Configuration class
  • Mutation: 100% coverage on every subject changed since main (1994+ mutations, 0 alive on the local pass)

https://claude.ai/code/session_01X7TU9zitHfqv6HmszqgfYa

claude added 19 commits May 5, 2026 21:52
Step 1 of the migration-API redesign:

- Add `Markbridge::Parse` (Data.define) carrying ast, format,
  unknown_tags, diagnostics. Returned by `parse_*` methods.
- Add `Markbridge::Conversion` (Data.define) carrying markdown, ast,
  format, unknown_tags, diagnostics, emissions, errors. Returned by
  `*_to_markdown` methods. `to_s` delegates to markdown so puts and
  string interpolation keep working.
- Drop `Markbridge::Configuration`, `.configuration`, `.configure`,
  `.reset_defaults!`, and the four `default_*` memoized accessors.
  Per-call kwargs replace global state; `escape_hard_line_breaks`
  capability returns in step 3 via `Markbridge.discourse_renderer`.
- Rewrite `spec/markbridge_spec.rb` for the new shape.
- Update system specs to use `expect(result.markdown)`.
- Strip stale doc references to the deleted singleton API.
Step 2 of the migration-API redesign. Aligns MediaWiki with the
sibling parser APIs (BBCode, HTML, TextFormatter all use `handlers:`).
The kwarg type is unchanged — still an `InlineTagRegistry` — only
the parameter name moves.

Affects:
- Markbridge.parse_mediawiki / .mediawiki_to_markdown
- Parsers::MediaWiki::Parser#initialize
- Parsers::MediaWiki::InlineParser#initialize
- Specs and docs that referenced the old name.
Step 3 of the migration-API redesign: route render-side
customization (custom Tags, custom escaper) through a one-line
factory plus a single per-call kwarg, instead of multiple
per-call kwargs.

- TagLibrary#unregister(klass): drop a binding so the renderer
  falls through to render_children (existing path at
  renderer.rb:25-33).
- TagLibrary#merge(hash): bulk register, with nil values
  unregistering.
- Markbridge.discourse_renderer(tags:, tag_library:, unregister:,
  escaper:, escape_hard_line_breaks:): one-line factory returning
  a configured Renderer.
- renderer: kwarg on the four *_to_markdown methods. When given,
  the supplied Renderer is used; otherwise a fresh default is
  built per call.
Step 4 of the migration-API redesign. Tags can now record
side-channel records during rendering without mutating
constructor-injected collections, and the records surface on
Conversion#emissions.

- Renderer: @emission_buffer keyed by Symbol; record_emission appends.
  Lifecycle parallels @interface_cache except the buffer is preserved
  after the root call so callers can drain it via #emissions.
- RenderingInterface#emit(key, payload) — no-return-value helper Tags
  call from #render. Tag's render return value is unaffected.
- with_provisional_emissions { |c| ...; c.commit if keep } —
  snapshot/rollback for tags whose first render attempt may be
  discarded. Used by TableTag's Markdown-vs-HTML decision so
  emissions from the throwaway pass don't double-emit.
- TableTag: wraps the Markdown attempt in with_provisional_emissions;
  commits only when the Markdown form survives.
- Conversion#emissions populated from renderer.emissions after render.

Specs cover: emission round-trip, buffer reset on subsequent root
calls, no-op outside render, provisional rollback when not committed,
provisional commit kept, and the TableTag double-emit regression
(both markdown-compatible and HTML-fallback paths).
Step 5 of the migration-API redesign. Custom handlers can now
delegate to defaults instead of reinventing them.

Each registry gains:

  registry.overlay("a") do |default|
    LinkifyingHandler.new(default:)
  end

The block runs at registration time and receives the previously
bound handler (which may be nil — the block must tolerate that).

Also adds TextFormatter::HandlerRegistry#[] for parity with the
sibling registries; previously only #has_handler? and the
internal #process_element exposed the mapping.
Step 6 of the migration-API redesign. Closes the gap that made
examples/custom_text_formatter_mappings.rb un-runnable.

- BaseHandler#process and every TextFormatter handler subclass
  now accept `processor:` (defaulting to nil for backward-friendly
  use in custom subclasses).
- HandlerRegistry#process_element forwards `processor:` and
  dispatches to .process for class-style handlers, .call for
  Proc/lambda handlers (matching how HTML's HandlerRegistry
  already handled lambdas).
- Parser passes self as the processor argument so handlers can
  recurse via `processor.process_children(element, ast_node)`.
- Repaired the example file: lambdas now return either the AST
  node (so the parser auto-processes children) or nil (for leaf
  nodes); they don't return the NodeSet from process_children
  (which fed back into parser.rb:106 and double-processed).

Specs cover lambda dispatch and the regression where a registered
handler returning nil must not be tracked as unknown.
Step 7 of the migration-API redesign. Brings MediaWiki to parity
with BBCode/HTML/TextFormatter parsers — unknown inline tags now
appear in Parse#unknown_tags / Conversion#unknown_tags.

- InlineParser shares a single unknown_tags Hash with depth-recursive
  child instances so nested parses contribute to the same tally.
- Only opening tags whose names aren't in the registry get tracked;
  closing forms and self-closing tags pass through silently
  (they typically pair with an opening already counted).
- Parser exposes attr_reader :unknown_tags and clears it at the
  start of every #parse call.
- Markbridge.parse_mediawiki snapshots parser.unknown_tags into
  the Parse result.
Step 8 of the migration-API redesign.

- Markbridge.convert(input, format:, **kwargs) — thin dispatcher
  over the four *_to_markdown methods. Useful when format is
  data-driven (e.g. iterating posts whose :format column varies).
  Raises ArgumentError on unknown formats.
- Markbridge.render(ast, format: :discourse, renderer:) — renders
  an existing AST. Returns a Conversion with format: :discourse,
  unknown_tags/diagnostics: {} (parser-side data not available
  when starting from an AST).
Step 9 of the migration-API redesign. The cleanup logic that ran
inside Markbridge#cleanup_markdown is now a tiny replaceable
collaborator on the Renderer.

- Renderers::Discourse::Postprocessor#call(text) — collapses 3+
  newlines, removes whitespace-only lines, strips. DEFAULT is a
  module-level singleton instance.
- Renderer accepts postprocessor: kwarg, defaulting to
  Postprocessor::DEFAULT.
- Markbridge.discourse_renderer accepts postprocessor: and
  forwards it to the Renderer.
- Markbridge#build_conversion and #render now go through
  renderer.postprocessor.call(...) instead of a private
  cleanup_markdown method.

Custom postprocessors compose by subclassing Postprocessor and
overriding #call. Spec exercises the end-to-end path.
Step 10 of the migration-API redesign. RawHandler used to call
@element_class.new(language:) unconditionally, which forced any
downstream AST class reused with RawHandler to declare a language:
kwarg even when irrelevant.

The handler now introspects the AST class's #initialize parameters
once and caches the result, only passing language: when the class
actually accepts it. AST::Code keeps its language: attribute;
ImageHandler overrides #create_element so it's unaffected.
Step 11 of the migration-API redesign. Importers can now isolate
per-row failures from the conversion loop instead of crashing on
a single bad post.

- raise_on_error: kwarg on the four *_to_markdown methods and
  on Markbridge.render. Defaults to true (preserves current
  behavior — render-time exceptions propagate).
- When raise_on_error: false, render-time StandardError is
  caught, the Conversion's markdown comes back as "" (whatever
  partial output had been built before the failure), and the
  exception is surfaced via Conversion#errors.
Step 12 of the migration-API redesign.

- examples/forum_migration.rb (NEW) — canonical end-to-end importer
  shape exercising every new path: discourse_renderer factory,
  custom Tag, FontHandler/FontNode, overlay, unregister:, emit,
  Conversion#emissions/#errors/#unknown_tags, raise_on_error: false,
  Markbridge.convert(format:) dispatch, custom escaper subclass.
- UPGRADING.md (NEW) — covers Conversion vs String returns,
  removed singleton API, removed per-call render-side kwargs,
  MediaWiki kwarg rename, TextFormatter processor: addition,
  emit migration shape, RawHandler language: relaxation,
  raise_on_error:.
- examples/basic_usage.rb — rewritten for the new API
  (Conversion result, discourse_renderer factory).
- docs/extending.md — surfaces the auto-passthrough fact for
  unregistered AST classes and the unregister: primitive.
Fixes the lint CI failure on PR #30 — six files had drifted from
syntax_tree's expected format. No semantic changes.
Inline snapshot_emissions/rollback_emissions into the only caller
(with_provisional_emissions), drop the lazy buffer init, and
guarantee @emission_buffer is always present from the constructor.
This removes the dead-code guards mutant correctly identifies as
unkillable, and fixes the unit-spec entry path that built a
Renderer + RenderingInterface manually without going through
#render (those tests crashed once #with_provisional_emissions
stopped tolerating a nil buffer).

Also normalize TagLibrary#merge to an explicit each_pair / if-else
form so mutant's hash-iteration mutations have a single matching
shape to target.

Renderer#emissions now returns transform_values dup of the
always-present buffer; its no-buffer fallback was dead. Updated
the renderer_spec example that asserted "ignores emissions made
outside a render call" — emissions from before the first render
are now retained until the first root call clears them.
Mutant flagged the convenience methods because the tests verified
"this returns a Conversion/Parse" but didn't pin every field of
the Data result. The mutations replaced individual fields with
nil or wrong values; tests now pin them.

- Drop redundant `handlers ||= ...` fallbacks in parse_bbcode /
  parse_html / parse_text_formatter_xml — every parser already
  defaults to its own .default registry when handed nil.
- Drop the .dup on parser.unknown_tags / parser.unclosed_raw_tags
  in the convenience methods. Each convenience call constructs a
  fresh parser, so there's no aliasing risk through the public
  API; the .dup was equivalent through Markbridge.* and mutant
  correctly identified it as such. Direct users of
  Parsers::*::Parser still see the original (un-dup'd) hashes
  which is the documented contract.
- Add tests pinning Conversion#ast, #format, #diagnostics,
  #unknown_tags shape; per-method "raise_on_error defaults to
  true" tests; #ast type via to_s coercion path.
- Make build_conversion's `renderer:` and `raise_on_error:`
  required kwargs — every caller already passes them explicitly,
  so the defaults were dead code.
Covers eight more subjects (Markbridge.convert, .render,
.discourse_renderer; Renderer#emissions, #initialize,
#with_provisional_emissions; RenderingInterface#emit,
#with_provisional_emissions; Postprocessor#call) by adding
focused tests rather than expanding ignore lists.

Notable additions:
- Per-format `convert` kwargs-forwarding tests (one per branch).
- `render` field-shape tests for ast/format/unknown_tags/
  diagnostics/errors and a "raise_on_error defaults to true" test.
- discourse_renderer tests for tag_library: base, postprocessor:
  forwarding.
- Renderer specs: postprocessor default fallback; deep snapshot
  semantics on emissions and with_provisional_emissions; explicit
  return-value test for with_provisional_emissions; postprocessor
  swap end-to-end via Markbridge.bbcode_to_markdown.
- RenderingInterface unit specs for #emit (delegation order,
  always-nil return) and #with_provisional_emissions
  (block-forwarding via and_yield).
- Postprocessor: multi-run gsub vs sub regression.
Final batch of mutation-coverage fixes for the migration-API
redesign. Covers 8 remaining subjects:

- HandlerRegistry#overlay (BBCode and HTML): array-of-names test
  pinning that each name's *previous* handler is yielded
  individually (kills the `Array(x).each` -> `[x].each` mutation).
- RawHandler#accepts_language?: drop the introspection cache
  (mutant correctly identified the cached/uncached branches as
  observably equivalent — the AST class never changes for the
  handler's lifetime, but the cache is unobservable through the
  public API). Add a public-API test using an AST class with
  non-:language kwargs to pin the predicate's contract.
- TextFormatter::HandlerRegistry#process_element: switch
  @mappings[name.upcase] to self[name] — equivalent path through
  the existing #[] (which already upcases), and removes a
  mutation surface.
- MediaWiki::Parser#parse: tests pinning the unknown_tags clear
  on re-parse, line-ending normalization (Unicode separator),
  and handlers: kwarg propagation into InlineParser.
- TagLibrary#merge: regression test that `nil` value triggers
  *deletion*, observable through #each iteration (registering
  `nil` would leave the class in iteration with a nil value).
- TableTag#empty_table?: pin the "rows-less-children" path with
  a Table whose only child is a stray Text node.
- TableTag#render: simplify `next nil` -> `next` (equivalent).
Two new shapes for callers that need to modify, append to, or
replace the AST between parse and render — without dropping to the
parser/renderer classes directly.

- Markbridge.render now accepts a Parse OR an AST::Node. Given a
  Parse, the resulting Conversion preserves the parser's
  unknown_tags, diagnostics, and source format; given an AST node
  the fields default to empty / :discourse (current behavior).
- *_to_markdown methods (bbcode/html/text_formatter_xml/mediawiki)
  and Markbridge.convert accept an optional block. The block is
  yielded the parsed AST between parse and render; mutations
  persist in Conversion#ast.

The block form reads cleanly in migration loops:

  Markbridge.convert(post.body, format: post.format, renderer:) do |ast|
    orphan_attachments.each { |a| ast << AttachmentRef.new(a.id) }
  end

Updated forum_migration.rb to demo the orphan-attachment pattern
and UPGRADING.md to document both shapes.

Mutant: 100% on the touched subjects (Markbridge.render, .convert,
and all four *_to_markdown methods).
gschlager added a commit that referenced this pull request May 6, 2026
Pivot the docs site to the task-oriented "I'm migrating a forum to
Discourse" perspective and bring every page in sync with the migration-
API redesign coming in PR #30.

New top-level "Migrating to Discourse" section with overview,
placeholders, and a full walkthrough that ports examples/forum_migration.rb
end-to-end. The placeholders page makes explicit what was implicit:
Tag return strings are spliced verbatim — only AST::Text goes through
the escaper — so placeholder sigils reach the output untouched.

The old guides/configuration.md is gone; renderer-side knobs move to
a new customization/customizing-renderer.md describing the
Markbridge.discourse_renderer(...) factory. PR30's UPGRADING.md is
ported under reference/upgrading.md.

Existing pages updated for: Conversion return type, removed
Markbridge.configure global, MediaWiki handlers: kwarg rename,
TextFormatter handler processor: arg, interface.emit and html_mode?
on the rendering interface table, the build-once / reuse-many
renderer pattern.

MIGRATION_API_PLAN.md updated to reflect §1-3 implemented in PR30,
§4 (DSL) considered and rejected, and the resolved open questions.

Every code sample in the migration section was verified against PR30's
branch via a sibling worktree before commit.
claude added 2 commits May 6, 2026 14:27
Two new escaper customisation knobs that replace the
subclass-and-override pattern from importers like Liferay/Wolfram.

- MarkdownEscaper.new(allow:) accepts a Symbol or Array of
  Symbols. Recognised keys: :bullet_list, :ordered_list,
  :atx_heading, :block_quote. Alias :lists expands to
  [:bullet_list, :ordered_list]. Unknown keys raise.
  Storage is a private Array (≤4 elements) so #include? is O(N)
  with N ≤ 4 — observably identical to a Set and avoids the
  allocation. Hot-path lookup is on the cold side of each block
  arm, so no measurable bench impact.
- pass_first_char_inline / pass_marker_inline helpers preserve
  the marker verbatim and inline-escape the rest, so e.g.
  `* item` with allow: :bullet_list comes out as `* item` (the
  star is NOT re-escaped by escape_inline).
- IdentityEscaper: a tiny class whose #escape(text) returns
  text || "". Public so callers can point at it.
- discourse_renderer(escape: false) routes to IdentityEscaper.
  Mutually exclusive with escape_hard_line_breaks: / allow: —
  raises ArgumentError if combined. An explicit escaper: still
  wins (mirrors the existing precedence).

forum_migration.rb now uses `allow: :lists` instead of an inline
`ListPermissiveEscaper < MarkdownEscaper` subclass.

Specs cover each key, alias, error path, edge cases (7-hash
non-heading, empty `# `, setext underlines under :bullet_list),
and the IdentityEscaper / mutual-exclusion paths. Mutant: 100%
on the touched subjects (MarkdownEscaper#initialize,
#resolve_allow, #pass_first_char_inline, #pass_marker_inline,
#escape_block_ordered_list, IdentityEscaper#escape,
Markbridge.discourse_renderer, Markbridge.build_escaper).

Bench: 0.014x noise vs main on the no-allow path (within
measurement error).
Adds two unit specs under MarkdownEscaper#initialize that pin the
default `escape_hard_line_breaks: false` (preserves trailing-space
hard breaks) vs. the explicit `true` form (strips them). The
markbridge_spec already exercised both shapes through the
discourse_renderer factory but those tests aren't selected for
the MarkdownEscaper#initialize subject under mutant's
description-matching test selection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants