fix(compiler): preserve frontmatter when closing "---" has no trailing newline#121
Open
jichaowang02-lang wants to merge 1 commit into
Open
Conversation
…g newline
`frontmatter.split` returns the frontmatter block ending in a bare "\n---"
(not "\n---\n") for a page that ends right at the closing delimiter with no
trailing newline — e.g. a frontmatter-only concept/entity page. Both
`_prepend_source_to_frontmatter` and `_remove_source_from_frontmatter` located
the closing delimiter with `fm_block.rpartition("\n---\n")`, which finds
nothing in that case: `fm_prefix` becomes "" and every existing frontmatter
line (the opening "---", `type`, `description`, and the prior `sources:` list)
is dropped — silently corrupting the page and losing source provenance.
Strip whichever closing form is actually present ("\n---\n" or a bare "\n---")
and re-append the same one. Pages that have a body (the common case) are
byte-for-byte unaffected.
This is reached in practice via `_add_related_link` (run for every related
concept/entity during compile) and the `openkb remove` flow
(`remove_doc_from_{concept,entity}_pages`).
Adds `TestFrontmatterSourceMutation` covering prepend + remove on a
no-trailing-newline page, plus a with-body regression guard.
There was a problem hiding this comment.
Pull request overview
Fixes a compiler corruption bug where mutating a YAML frontmatter sources: list could drop the entire frontmatter when the closing --- delimiter is at EOF with no trailing newline. This makes frontmatter source provenance updates safe for “frontmatter-only” pages.
Changes:
- Update
_prepend_source_to_frontmatterand_remove_source_from_frontmatterto correctly strip/re-append either\n---\nor bare\n---closings. - Add regression tests covering prepend/remove behavior for no-trailing-newline pages and a with-body guard.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
openkb/agent/compiler.py |
Preserves existing frontmatter keys/sources by handling both closing delimiter forms when rewriting sources:. |
tests/test_compiler.py |
Adds targeted tests to prevent regressions for frontmatter-only pages without a trailing newline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| class TestFrontmatterSourceMutation: | ||
| """``_prepend``/``_remove_source_from_frontmatter`` must preserve existing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A frontmatter-only concept/entity page that ends right at the closing
---with no trailing newline gets its frontmatter silently destroyed when a
source is added or removed: the opening
---, thetype/descriptionkeys, and the existing
sources:list are all dropped — corrupting the pageand losing source provenance.
Root cause
frontmatter.splitis lossless and, for a page with no trailing newline afterthe closing delimiter, returns a block ending in a bare
\n---(not\n---\n):But both
_prepend_source_to_frontmatterand_remove_source_from_frontmatterlocated the closing delimiter with
fm_block.rpartition("\n---\n"), whichmatches nothing in that case — so
fm_prefixbecomes"", every frontmatterline is lost, and the block is rebuilt from nothing:
Concrete trace (
_prepend_source_to_frontmatter):This is reached during normal use via
_add_related_link(run for everyrelated concept/entity at compile time) and the
openkb removeflow(
remove_doc_from_{concept,entity}_pages).Fix
Strip whichever closing form is actually present and re-append the same one:
With the fix the same input yields
---\nsources: ["summaries/p2.md", "summaries/p1.md"]\ntype: "Concept"\ndescription: "Focus"\n---— all keys and the prior source preserved. Pages that have a body (the common
case) are byte-for-byte unaffected.
Testing
Adds
TestFrontmatterSourceMutation(prepend + remove on a no-trailing-newlinepage, plus a with-body regression guard). The existing
_add_related_link/remove_doc_from_entity_pagestests only used pages with a body, so thispath was uncovered.