C++ search by ms609 · Pull Request #210 · ms609/TreeSearch

ms609 · 2026-03-19T18:56:59Z

other optimizations + features

Manual testing underway; shiny app in particular has some usability issues.

Replace blunt (k, n) feasibility thresholds with a partition-shape-aware metric: the split_count (coefficient of x^floor(n/2) in the generating polynomial). This allows skewed multi-state characters to be computed exactly even at large n, while still blocking balanced partitions that would blow up. - Add .MSSplitCount() and .MS_SC_THRESHOLD in data_manipulation.R - Replace maxTips gate in PrepareDataProfile and pp_info_extra_step - Shrink 5-state test from n=11 (blowup) to n=10 (0.6s) - Add 18 split_count boundary + base-case tests (55 total, 5.6s)

…ocus 9 wagner_edge_violates_constraint (ts_wagner.cpp) and regraft_violates_constraint (ts_constraint.cpp) both rejected the edge directly above the constraint clade (below == cn) for MUST_OUTSIDE tips/clades. is_ancestor_or_equal(cn, below) returns true when below == cn, but inserting an outside element just above the constraint clade makes it a sibling — monophyly is preserved. Fix: add '&& below != cn' guard to the MUST_OUTSIDE rejection branch in both functions. Search quality improvement; no correctness impact. Tests: +2 constrained Wagner regression tests (43 total). 43 Wagner + 18 constraint + 152 driven-search tests pass.

The infeasible test cases used partitions from the old maxTips gate (n > 25 for k=3). The split_count gate (f738091) replaced maxTips but the test partitions weren't updated — (13,12,10) has sc=118, below the k=3 threshold of 136, causing the exact solver to run indefinitely. Changed infeasible k=3 partition from (13,12,10) n=35 to (13,13,12) n=38 (sc=140 > 136). Also fixed stale comments and suppressed expected fallback warnings.

Add compute_collapsed_flags() (ts_collapsed.h/.cpp) which identifies zero-length edges where clipping provably cannot improve the score. Integrated into TBR, SPR, and drift search. Disabled during MPT enumeration to preserve equal-score topology discovery. Benchmark (4 standard morphological datasets, 3 seeds each): - Skip rate: 0% on all datasets — near-optimal trees have negligible zero-length edges in typical morphological data - Score equivalence: confirmed (enabled vs disabled produce identical best scores) - Overhead: negligible Add test-ts-collapsed.R (Tier 2, 15 assertions): flag structure, score equivalence, sparse-data skip counts, inapplicable characters, implied weighting, driven search integration. Add bench_collapsed.R benchmark script. Update NEWS.md with search optimization entries. Update AGENTS.md: module map, benchmark results, test count.

…r reuse Binary (2-token) characters: O(1) Carter et al. (1990) closed-form shortcut when both states are pure (no ambiguous tips). Eliminates the exponential recursive algorithm entirely for this common case. Benchmark: k=2 n=50 drops from 0.22s to <0.001s (>100x). Multi-state (3-5 token) characters: - FastLSETables: 4096-entry precomputed lookup tables for exp(-d) and log1p(exp(-d)) with linear interpolation. Used in LSEAccumulator::add() to replace std::exp() calls. - lse_update(): shared inline helper for log-sum-exp accumulation, replacing ~6 copies of the same pattern in LogPVec/runAll. - logconv(): changed from allocating+returning std::vector to writing into caller-provided buffer. Eliminates allocation per convolution call. - Reusable noStepVec/yesStepVec/conv_buf buffers in LogPVec, allocated once per call, std::fill'd per draw pair. Multi-state speedup: ~5-19% for k=3 boundary cases (within noise for k=4/k=5 due to recursion depth dominating). Also adds logDoubleFact, logN_carter, logCarter1_cpp helpers for the Carter formula (C++ implementation of the R LogCarter1).

Phase 1: CollapsedRegions struct groups connected collapsed edges into regions via reverse-postorder propagation. Region IDs enable efficient per-region tracking. Phase 3-4: TBR, SPR, and drift regraft loops skip interior collapsed edges (collapsed[below] == 1). Boundary edges are always evaluated. Within a collapsed region, all edges have zero local cost and identical state propagation, so interior positions are redundant. Phase 5: TreePool::add_collapsed() uses compute_collapsed_splits() to hash only non-collapsed edges for dedup. Trees differing only in zero-length resolutions are treated as duplicates, improving pool diversity (Goloboff & Farris 2001). Driven search pipeline updated to compute collapsed flags per replicate and use add_collapsed() for both replicate results and fuse results. All 1875 ts-* tests pass. No score regressions on Vinther2008 (79) or Zhu2013 (661). Literature notes from Goloboff (1996), Goloboff & Farris (2001), and Goloboff & Morales (2023) added to plan file.

The original threshold calibration (136/100/100) used sequential state indices (1,2,3,...) instead of bitmask positions (1,2,4,8,16). This resulted in testing a different (easier) combinatorial problem, yielding dangerously high thresholds that allowed the exact solver to run indefinitely on balanced partitions within the supposed safe range. Recalibrated with correct encoding (states at 2^(i-1) positions): k=3: 136 → 75 (n=27 balanced 0.97s safe, n=31 balanced 1.32s marginal) k=4: 100 → 50 (n=13 balanced 0.36s safe, n=15 balanced 0.94s marginal) k=5: 100 → 35 (n=9 balanced 0.22s safe, n=10 balanced 0.49s) Updated boundary tests and docstrings to match.

Replace CollapsedRegions struct with plain vector<uint8_t> in all search functions. The region_id, n_regions, and n_collapsed fields were computed but never read by any consumer — all callers only check the collapsed flag array. Saves an O(n_node) allocation + preorder traversal per recomputation.

ThreadSafePool::add_collapsed() wraps pool_.add_collapsed() with mutex. Worker threads and fuse_round() now compute collapsed flags and use add_collapsed(), matching the serial driven_search path. extract_into() still uses standard add() since it runs post-search and doesn't need collapsed dedup.

After a minimum number of replicates (default 5), extract the strict consensus splits from the pool and enforce them as topological constraints for subsequent replicates. This focuses search on uncertain parts of the tree. Constraints are cleared whenever a new best score is found. Opt-in via consensusConstrain = TRUE in SearchControl(); disabled by default. Also includes strategy preset tuning: - default preset: wagnerStarts=3, sprFirst=TRUE, adaptiveLevel=TRUE - thorough preset: sprFirst=TRUE New files: - build_constraint_from_bitsets() in ts_constraint - extract_consensus_splits() in ts_pool - test-ts-consensus-constrain.R (8 tests)

… consensus constraints - Collapsed-region regraft merging (Goloboff 1996): skip interior zero-length edges in TBR/SPR/drift regraft loops - Collapsed-topology pool dedup: trees differing only in zero-length resolutions treated as duplicates (serial + parallel paths) - Diversity-aware pool eviction: evict most-similar entry on ties - Cross-replicate consensus constraint tightening (opt-in) - Strategy preset tuning: wagnerStarts=3, sprFirst, adaptiveLevel

Regraft edges were collected in preorder (root-adjacent first), which systematically evaluates high-cost positions first before the bounded indirect scoring has a tight cutoff. This wastes full evaluations. Add partial Fisher-Yates shuffle (first 20 positions) after edge collection in TBR, SPR, and drift search. The shuffled prefix seeds the evaluation with edges drawn from across the tree, establishing a tight cutoff early. Remaining edges are evaluated in collection order. Overhead: ~200ns per clip (20 swaps), negligible vs per-clip evaluation cost (~3us for 75-tip tree). Benefit: tighter bound → more early bailouts in subsequent evaluations. All 1782 ts-* tests pass. Score quality unchanged (778 on Agnarsson2004 confirmed).

Add a 2-second wall-clock budget to SolverT. Every LogB and LogPVec entry checks std::chrono::steady_clock; if the budget is exceeded, budget_exceeded is set and all loops bail out immediately. runAll() returns NA_REAL with an Rcpp::warning so downstream code can detect the fallback. The guard prevents infinite hangs even if a partition slips past the split_count feasibility gate (e.g. through future threshold changes). Chrono overhead is ~20 ns per call — negligible versus per-call work. Verified: - k=3 n=35 balanced (known blowup): returns NA in 2.02 s - 137 existing tests pass (55 MS + 82 pp-multistate, 9.2 s)

Systematic profiling across all 14 benchmark datasets showed the previous 4% perturbation probability was too gentle to escape deep local optima. Key changes: default preset: - ratchetPerturbProb: 0.04 → 0.25 - ratchetPerturbMaxMoves: 0 (auto=20) → 5 - ratchetCycles: 5 → 10 - driftCycles: 2 → 4 - driftAfdLimit: 3 → 5 - driftRfdLimit: 0.1 → 0.15 thorough preset: - ratchetPerturbProb: 0.04 → 0.25 - ratchetPerturbMaxMoves: 0 (auto=20) → 5 (keep 20 cycles) Validated on all 14 datasets: 9 improved, 4 unchanged, 1 marginal at 10s budget (positive at 20s). Median improvement on hard datasets: Zanol2014 +7, Giles2015 +4, Wortley2006 +3. SearchControl() defaults updated to match the default preset. Benchmark framework strategies updated to match.

…profiles Replace binary reduction with a log-quadratic MC approximation that preserves the full multi-state character for infeasible MaddisonSlatkin computations. Key changes: - 'auto' and 'mc' approx modes now both use MC for infeasible chars (no more state reduction to binary) - .ApproxStepInformation() uses log-quadratic interpolation anchored at the exact P(s_min) to bridge the gap between the analytic minimum-steps probability and the MC body - Default n_mc increased from 5000 to 50000 for better tail resolution - PrepareDataProfile() no longer reduces multi-state characters in the data matrix; infeasibility handled entirely in StepInformation() - Log-linear fallback when quadratic sanity checks fail (monotonicity or concavity) Validated: R² = 0.9997 for log-quadratic fit vs exact on (8,8,8) n=24. MC-estimated IC(0) matches exact within 0.5 bits at the feasibility boundary. All 105 tests pass.

Per-phase profiling showed drift contributes ~0 score improvement per replicate (Zhu2013, 10 seeds). Shifting 2 drift cycles to ratchet: ratchetCycles: 10 → 12 driftCycles: 4 → 2 Full-pipeline validation (20s, 5 seeds): Zhu2013: median 644/643 (was 645/644) Giles2015: median 713/712 (was 714/713) Dikow2009: median 1611/1611 (was 1612/1611) More replicates completed in same time budget due to cheaper per-rep cost.

Add mc_fitch_scores() in src/ts_mc_fitch.cpp: generates random trees via random_tree() (pure C, no Morphy dependency) and scores each with an inline bitmask Fitch downpass. No R object allocation per tree. Performance: 357k trees/sec on 38-tip 3-state characters (vs ~100k/s with the old Morphy path). This makes 100k MC samples cost ~0.28s. - Wire mc_fitch_scores into .ApproxStepInformation(), replacing the R-level RandomTree + TreeLength loop - Bump default n_mc from 50k to 100k (now affordable) - Update docstrings to reflect the new implementation - All 288 tests pass (88 multistate + 17 data_manipulation + 183 ts-*)

Replace exp(-K) with tighter binomial bound (1-K/R)^R in SearchConfidenceText(). Falls back to exp(-K) when K==R to avoid overconfident 0%. New optional parameters (wired by T-164): - nTopologies: pool topology count; flags limited independence when T==1 - lastImprovedRep: trajectory info shown in message Diagnostic flags: - Ruggedness: warns when K/R < 0.3 and R >= 5 (only 10-29% of runs found best score) - Small sample: nudges user to increase target hits when K==R and R<=5 Updated tooltip formula text and 58 search module tests pass.

ThreadSafePool::extract_into() rebuilt the output pool entry-by-entry, which reset hits_to_best to the number of distinct topologies (often 1) instead of the actual number of independent replicate hits. Root cause: TreePool::add() tracks hits via its own internal counter, but extract_into() populated a fresh pool from entries, losing the accumulated hit count from the shared pool. Fix: after transferring entries, propagate the real hits_to_best from the internal pool via set_hits_to_best(). Impact: with nThreads >= 2, the Shiny app's convergence confidence display ('K of R runs hit best score') was severely underreported (e.g. showing 1 hit instead of 11). The search algorithm itself was unaffected — only the reported statistics were wrong. Also fixes the user-facing symptom: Agnarsson2004 with k=5.62 (XPIWE) appeared to have 2% hit rate across 230 runs, when the actual rate is ~55-60%. The score 50.1872 is correct (exact k = 10^0.75 = 5.623413).

…orer perf: MaddisonSlatkin optimizations + native C++ MC Fitch scorer

Add perturbStopFactor parameter to SearchControl(): stop after nTip * K consecutive replicates that fail to improve the best score. Default 0 (disabled). Small values (1-3) are typical. Inspired by IQ-TREE's unsuccessful-perturbation stopping rule (Nguyen et al. 2015), adapted from per-perturbation to per-replicate granularity. Implemented in both serial (ts_driven.cpp) and parallel (ts_parallel.cpp) paths. The serial path tracks consecutive unsuccessful replicates directly; the parallel path monitors score improvement in the main thread's polling loop. Two new tests: verify the rule stops search early when enabled, and that it's a no-op when disabled (factor=0).

T-230 (commit a37984d) gated the replicate-count warning behind verbosity > 0, but the test file used verbosity = 0L in all cases. Update the two expect_warning() tests to use verbosity = 1L so the warning is actually emitted.

…check ThreadSafePool::fuse_round() was missing three things after impose_constraint() that the serial driven_search fuse path has: 1. build_postorder() — without this, score_tree() iterates nodes in stale order after topology modification, producing wrong scores. 2. map_constraint_nodes() verification — impose_constraint() is heuristic and can fail; serial path discards the tree on failure. 3. fused_ok guard — serial path only adds to pool and resets hits when the repair succeeded. Impact: with nThreads >= 2 AND topological constraints, fused trees that violate constraints could enter the pool with incorrect scores. Low practical impact (constraints + parallel + fuse violation is rare), but the missing build_postorder() is a real scoring bug. Found by S-RED focus 4 (Parallelism & RNG).

The consensus stability check in the parallel main-thread poll loop called update_consensus_stability() every 200 ms, even when no new replicates had completed. This caused the unchanged counter to increment on idle polls, leading to premature search termination when replicates are slow (e.g. large trees where each replicate takes 30+ seconds). Fix: track replicates_done at last check and only call update_consensus_stability() when new replicates have been added. Found by S-RED focus 4 (Parallelism & RNG).

Found and fixed: parallel consensus stability check incremented unchanged counter on idle 200ms polls instead of per-replicate, causing premature termination with slow replicates (large trees). Also noted: R_CheckUserInterrupt in try/catch is ABI-fragile (longjmp vs C++ exceptions), Rf_error in ts_wagner.cpp reachable from workers (low risk), dead code in poll loop, multiple threads can trigger fuse_round on same done count (correct but wasteful). Reset S-RED to OPEN.

* feat(T-207/T-210): Multi-cycle PCSA perturbation phase Cherry-picked from feature/pt-eval (closed PR #222). - Multi-cycle PCSA: SA -> TBR polish -> keep best, repeated annealCycles times (T-207) - Best-tree tracking in anneal_search() across SA phases (T-210) - Single-phase temperature fix: n_phases=1 runs at t_start (hot) - New SearchControl parameter: annealCycles (default 0 = disabled) - Large preset: annealCycles=3 - Backward compat: compat wrapper auto-sets annealCycles=1 when annealPhases > 0 but cycles not specified - Vignette: PCSA section in search-algorithm.Rmd - Tests updated for annealCycles WIP: needs build verification and GHA check before PR. * fix: restore mc_fitch_scores export + .ts_driven_search_raw naming (cherry-pick artifacts)

…rained_tree

…ilder Build random binary tree that satisfies topological constraints by: 1. Ordering constraint splits smallest-to-largest 2. Assigning each tip to its tightest enclosing split 3. Bottom-up: randomly resolve each split's children into binary subtree 4. Wire root-level items (unconstrained tips + top-level splits) Replaces Wagner fallback (T-214 workaround) for RANDOM_TREE strategy, restoring uniform random topology sampling diversity under constraints. Tests: 916 constraint + 24 random-constrained + 373 simplify/driven pass.

- Clarify MaddisonSlatkin() @examples (show logp intermediate) - Point FixedDraws overflow error to StepInformation(approx='mc') - Note MC fallback in MaddisonSlatkin Rd documentation From uncommitted work on feature/madslatkin-profiling (PR #211).

Cherry-picked from feature/parallel-temper (6dc28a2). Replaces static extract_divided_steps() copies with shared extract_char_steps() in TBR, SPR, and drift. NA blocks now use three-pass correction formula instead of raw local_cost. ts_temper.cpp already correct (via PR #227 PCSA merge).

- T-196: cherry-picked NA+IW fix from feature/parallel-temper - T-198-201: closed (PT ruled out; PCSA landed via PR #227) - Removed PT section from to-do.md - Deleted feature/parallel-temper and feature/pt-eval (remote+local) - PT findings preserved in .positai/expertise/pt-evaluation.md

Three issues in impose_one_pass / impose_constraint: 1. Bail-out threshold n_tip/4 was too aggressive. For n_tip=5 the threshold was 1, so any split requiring 2+ moves caused an immediate bail-out before making any repairs. Raised to n_tip. 2. impose_one_pass returned 0 both for 'no violations' and 'bailed out', so impose_constraint couldn't distinguish the two. Now returns -1 on bail-out, allowing the caller to know the repair may be incomplete. 3. Documented the root-child limitation: spr_clip() doesn't fully handle root children, so impose_one_pass skips them. The fuse path's post-repair verification guard (T-214) catches these cases. Tests: 940 constraint + 308 simplify/driven/sector pass.

spr_clip() can't detach root children (root is its own parent, so the bypass logic fails). All search callers skip root children, but impose_constraint needs them for constraint repair. New topology_spr() helper in ts_constraint.cpp handles the root-child case by absorbing the sibling into root and repurposing it as the insertion node. No changes to spr_clip or any search caller. Also removes the build_postorder call that was missing between individual moves within impose_one_pass — each topology_spr is now followed by a postorder rebuild so edge enumeration stays valid. Tests: 942 constraint (90 impose-constraint, incl 2 new root-child tests) + 308 simplify/driven/sector pass.

T-208/T-211: random_constrained_tree() and impose_constraint() fixes

Benchmarked perturbStopFactor across 10 morphobank/inapplicable datasets (23-213 tips). Key findings: - PSF=2 gives 2.4-6.9x speedup on converged searches with zero score loss - Complementary to targetHits: on hard landscapes where few replicates hit the best score, PSF fires first; on easy landscapes targetHits fires first (PSF is irrelevant) - PSF=5 provides smaller speedups and is too conservative for large trees Changed default from 0 (disabled) to 2 in SearchControl() and compat wrapper. Updated docs and fixed timeout test (needs PSF=0 to test the timeout path, not convergence).

T-187: Perturbation-count stopping rule

ms609 added 30 commits March 20, 2026 12:55

fix(shiny): reset run stats when concavity changes T-165

a98e66c

fix(cpp): cancel checks in MPT enumeration and final TBR polish T-163

425eb78

Treesearch optimizations

7189b02

docs: update AGENTS.md with VTune workflow notes

363f6b4

chore: update coordination notes for polytomy-search merge

8bbc40d

PLan updates

d7e31dd

Rm notes

cda2f17

Merge branch 'cpp-search' into feature/madslatkin-profiling

25adf0f

CI on all PRs

e38f214

Agent G work

fafd5d0

bgs; redundant params; logo

b8e56e2

nonignoreinit

157aebc

ms609 added 30 commits March 25, 2026 14:22

chore: agent C — T-242 fix committed, awaiting GHA

da341b0

chore: T-242 GHA 23545987517 dispatched

5dff909

Merge PR #211: MaddisonSlatkin optimizations + native C++ MC Fitch sc…

8a06ff0

…orer perf: MaddisonSlatkin optimizations + native C++ MC Fitch scorer

chore: T-187 PR #226 open, agent D idle

9e4ae7c

chore: T-187 → PR #226

0f6b72d

chore: claim S-RED (D) — focus 4: Parallelism & RNG

8053a58

chore: S-RED focus 4 findings — T-243 filed and fixed

ec138a5

chore: S-COORD round 20 — clean stale entries, update PR refs

01b3f30

Merge branch 'cpp-search' into feature/perturb-stop

4cc5014

docs: T-212 test comments — reflect Wagner fallback, not random_const…

61fbd03

…rained_tree

docs: update T-212 test comments for random_constrained_tree

8650522

chore: add PCSA, reconverged, reconverges to WORDLIST

8cd3ac7

Merge branch 'cpp-search' into feature/random-constrained-tree

80f59a4

Merge pull request #229 from ms609/feature/random-constrained-tree

e6ba08c

T-208/T-211: random_constrained_tree() and impose_constraint() fixes

Merge pull request #226 from ms609/feature/perturb-stop

6a817b0

T-187: Perturbation-count stopping rule

Coordination

f907de4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ search#210

C++ search#210
ms609 wants to merge 330 commits intomainfrom
cpp-search

ms609 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ms609 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant