Skip to content

T-200: Decouple R-loop search from MorphyLib; add concavity parameter#216

Open
ms609 wants to merge 16 commits intocpp-searchfrom
feature/native-search
Open

T-200: Decouple R-loop search from MorphyLib; add concavity parameter#216
ms609 wants to merge 16 commits intocpp-searchfrom
feature/native-search

Conversation

@ms609
Copy link
Owner

@ms609 ms609 commented Mar 24, 2026

Agent B

Summary

Replaces MorphyLib scoring in TreeSearch(), Ratchet(), Jackknife() with
native C++ engine (ts_fitch_score()), while retaining backward compatibility
for users passing MorphyLib scorers explicitly.

New exports

  • PrepareNativeData() — replaces PhyDat2Morphy()
  • CleanNativeData() — replaces UnloadMorphy()
  • NativeLength() — replaces MorphyLength()
  • NativeBootstrap() — replaces MorphyBootstrap()

Changed defaults

TreeSearch, Ratchet, Jackknife, and EdgeListSearch now default to
native scorers. New concavity parameter (default Inf = equal weights)
makes implied-weights searches trivial:

TreeSearch(tree, dataset, concavity = 10)

Deprecations

PhyDat2Morphy, UnloadMorphy, MorphyLength, MorphyTreeLength,
MorphyBootstrap emit .Deprecated() warnings.

Vignette

custom.Rmd IW section rewritten from ~150 lines of MorphyLib scaffolding
to ~30 lines.

Tests

7 new tests in test-NativeSearch.R. Existing tests updated to use native
defaults. 50 pass / 0 fail.

Note

Native scorer uses simple IW (fit = e / (e + k)), matching
TreeLength(..., extended_iw = FALSE). XPIWE (Extension 3) remains in the
C++ search path.

ms609 added 16 commits March 24, 2026 14:26
- New R/NativeSearch.R: PrepareNativeData, CleanNativeData,
  NativeLength, NativeBootstrap (4 exported functions)
- TreeSearch(), Ratchet(), Jackknife() now default to native scorer
- New 'concavity' parameter on all three functions
- Jackknife() rewritten to support both native and morphy paths
- EdgeListSearch() default scorer changed to NativeLength
- Deprecated: PhyDat2Morphy, UnloadMorphy, MorphyLength,
  MorphyTreeLength, MorphyBootstrap
- Vignette custom.Rmd IW section rewritten (was ~150 lines of
  MorphyLib scaffolding, now 30 lines using concavity parameter)
- test-NativeSearch.R: 7 tests covering all new functions
- test-CustomSearch.R: updated to use native defaults
Adds NativeSearch.Rd and updates Jackknife.Rd, Ratchet.Rd,
TreeSearch.Rd with new concavity parameter and default changes.
…ilder

Build random binary tree that satisfies topological constraints by:
1. Ordering constraint splits smallest-to-largest
2. Assigning each tip to its tightest enclosing split
3. Bottom-up: randomly resolve each split's children into binary subtree
4. Wire root-level items (unconstrained tips + top-level splits)

Replaces Wagner fallback (T-214 workaround) for RANDOM_TREE strategy,
restoring uniform random topology sampling diversity under constraints.

Tests: 916 constraint + 24 random-constrained + 373 simplify/driven pass.
Three issues in impose_one_pass / impose_constraint:

1. Bail-out threshold n_tip/4 was too aggressive. For n_tip=5 the
   threshold was 1, so any split requiring 2+ moves caused an immediate
   bail-out before making any repairs. Raised to n_tip.

2. impose_one_pass returned 0 both for 'no violations' and 'bailed out',
   so impose_constraint couldn't distinguish the two. Now returns -1 on
   bail-out, allowing the caller to know the repair may be incomplete.

3. Documented the root-child limitation: spr_clip() doesn't fully handle
   root children, so impose_one_pass skips them. The fuse path's
   post-repair verification guard (T-214) catches these cases.

Tests: 940 constraint + 308 simplify/driven/sector pass.
spr_clip() can't detach root children (root is its own parent, so the
bypass logic fails). All search callers skip root children, but
impose_constraint needs them for constraint repair.

New topology_spr() helper in ts_constraint.cpp handles the root-child
case by absorbing the sibling into root and repurposing it as the
insertion node. No changes to spr_clip or any search caller.

Also removes the build_postorder call that was missing between
individual moves within impose_one_pass — each topology_spr is now
followed by a postorder rebuild so edge enumeration stays valid.

Tests: 942 constraint (90 impose-constraint, incl 2 new root-child
tests) + 308 simplify/driven/sector pass.
- Clarify MaddisonSlatkin() @examples (show logp intermediate)
- Point FixedDraws overflow error to StepInformation(approx='mc')
- Note MC fallback in MaddisonSlatkin Rd documentation

From uncommitted work on feature/madslatkin-profiling (PR #211).
Cherry-picked from feature/parallel-temper (6dc28a2). Replaces
static extract_divided_steps() copies with shared extract_char_steps()
in TBR, SPR, and drift. NA blocks now use three-pass correction
formula instead of raw local_cost.

ts_temper.cpp already correct (via PR #227 PCSA merge).
- T-196: cherry-picked NA+IW fix from feature/parallel-temper
- T-198-201: closed (PT ruled out; PCSA landed via PR #227)
- Removed PT section from to-do.md
- Deleted feature/parallel-temper and feature/pt-eval (remote+local)
- PT findings preserved in .positai/expertise/pt-evaluation.md
Add perturbStopFactor parameter to SearchControl(): stop after
nTip * K consecutive replicates that fail to improve the best score.
Default 0 (disabled). Small values (1-3) are typical.

Inspired by IQ-TREE's unsuccessful-perturbation stopping rule
(Nguyen et al. 2015), adapted from per-perturbation to per-replicate
granularity.

Implemented in both serial (ts_driven.cpp) and parallel (ts_parallel.cpp)
paths. The serial path tracks consecutive unsuccessful replicates directly;
the parallel path monitors score improvement in the main thread's
polling loop.

Two new tests: verify the rule stops search early when enabled,
and that it's a no-op when disabled (factor=0).
Benchmarked perturbStopFactor across 10 morphobank/inapplicable datasets
(23-213 tips). Key findings:

- PSF=2 gives 2.4-6.9x speedup on converged searches with zero score loss
- Complementary to targetHits: on hard landscapes where few replicates
  hit the best score, PSF fires first; on easy landscapes targetHits
  fires first (PSF is irrelevant)
- PSF=5 provides smaller speedups and is too conservative for large trees

Changed default from 0 (disabled) to 2 in SearchControl() and compat
wrapper. Updated docs and fixed timeout test (needs PSF=0 to test the
timeout path, not convergence).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant