Optimize xml_path and xml_find_all for large node sets by astruebi · Pull Request #478 · r-lib/xml2

astruebi · 2026-06-05T19:35:15Z

Closes #477

This PR improves performance for large node sets in two places:

xml_path() now caches reusable ancestor paths while processing an xml_nodeset, avoiding repeated parent walks and repeated construction of shared path prefixes.
xml_find_all.xml_node() avoids an unnecessary duplicate pass for single-context XPath node sets and materializes xml_node results with less per-node overhead.

The intended semantics are unchanged.

Tests added:

batch xml_path() output is identical to calling xml_path() on each node individually
single-context xml_find_all() XPath node sets remain unique even for union expressions

Implementation note:

Some repetition in the path occurrence logic is intentional. I tested a more compact shared abstraction for sibling occurrence counting, but it made the hot loop noticeably slower for large node sets. The current version keeps those cases explicit to preserve the performance gain while matching xmlGetNodePath() semantics.

Local checks:

testthat::test_dir("tests/testthat"): PASS
R CMD build --no-manual --no-build-vignettes .: OK
_R_CHECK_FORCE_SUGGESTS_=false R CMD check --no-manual --no-build-vignettes xml2_1.5.2.tar.gz: OK, with only vignette warnings due to --no-build-vignettes

astruebi added 2 commits June 5, 2026 21:18

Optimize xml_path for node sets

2622afe

Optimize xml_find_all node materialization

8e7f538

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize xml_path and xml_find_all for large node sets#478

Optimize xml_path and xml_find_all for large node sets#478
astruebi wants to merge 2 commits into
r-lib:mainfrom
astruebi:codex/optimize-xml-path-nodesets

astruebi commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

astruebi commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant