Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/ASan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,9 @@ jobs:
run: |
export LD_PRELOAD=$(gcc -print-file-name=libasan.so)

echo "PKG_CFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" > src/Makevars
echo "PKG_CXXFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" >> src/Makevars
# Replace PKG_CXXFLAGS in-place; preserve PKG_CPPFLAGS and PKG_LIBS
sed -i 's/^PKG_CXXFLAGS.*/PKG_CXXFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer/' src/Makevars
echo "PKG_CFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" >> src/Makevars

mkdir ~/.R
echo "LDFLAGS = -g -O0 -fsanitize=address -fno-omit-frame-pointer" >> ~/.R/Makevars
Expand Down
7 changes: 7 additions & 0 deletions .positai/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,15 @@
"wc *": "allow"
},
"external_directory": {
"C:/Users/pjjg18/GitHub/TreeSearch-a/*": "allow",
"C:/Users/pjjg18/.positai/skills/r-package-profiling/references/*": "allow",
"../Quartet/*": "allow",
"../TreeTools/*": "allow"
},
"webfetch": {
"https://github.com/*": "allow",
"https://api.github.com/*": "allow",
"https://raw.githubusercontent.com/*": "allow"
}
},
"model": {
Expand Down
46 changes: 46 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1177,6 +1177,52 @@ for the cross-pairs case.

---

## LinkingTo Header Exposure (`expose-lapjv` branch)

Extracted LAP and MCI C++ APIs into `inst/include/TreeDist/` headers so that
downstream packages (e.g., TreeSearch) can use `LinkingTo: TreeDist`:

| Header | Content |
|--------|---------|
| `types.h` | `cost`, `lap_dim`, `lap_row`, `lap_col`, constants |
| `cost_matrix.h` | `CostMatrix` class (Rcpp-free) |
| `lap_scratch.h` | `LapScratch` struct |
| `lap.h` | `lap()` declarations |
| `lap_impl.h` | `lap()` implementation (include in exactly one TU) |
| `mutual_clustering.h` | MCI declarations |
| `mutual_clustering_impl.h` | MCI implementation (include in exactly one TU) |

`src/lap.h` is now a thin wrapper that includes `<TreeDist/lap.h>` and
re-exports types to global scope.

### LAPJV codegen regression (diagnosed, partially mitigated)

Including `lap_impl.h` in `lap.cpp` changed the TU context enough for GCC 14's
register allocator to produce ~8% more instructions in the Dijkstra hot loop,
causing a consistent 20–25% regression on standalone LAPJV (n ≥ 400).

**Root cause:** the installed-header version of `CostMatrix` (in
`inst/include/TreeDist/cost_matrix.h`) has a different method set than main's
monolithic `src/lap.h` (extra methods like `setWithTranspose()`, `dim8()`;
missing test variants like `findRowSubminNaive`). This changes GCC's
optimization heuristics for the entire TU, even though `lap()` never calls
the differing methods.

**Fix:** define `lap()` directly in `lap.cpp` (not via `#include <TreeDist/lap_impl.h>`)
with `__attribute__((optimize("align-functions=64", "align-loops=16")))`.
The `lapjv()` wrapper fills the transposed buffer first (matching R's
column-major storage) then untransposes — restoring the cache-friendly pattern.

**Residual:** ~5–9% on LAPJV 1999×1999 vs main (from the CostMatrix class
definition difference). Tree distance metrics are unaffected — they call
`lap()` from `pairwise_distances.cpp`, whose TU context is unchanged.

**Maintenance note:** if the `lap()` algorithm changes, update BOTH `src/lap.cpp`
and `inst/include/TreeDist/lap_impl.h`. The duplication is intentional — it
preserves the TU context that was profiled and tuned.

---

## Remaining Optimization Opportunities

- Sort+merge pre-scan for `rf_info_score`: **DONE** — replaced O(n²) loop with
Expand Down
11 changes: 10 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
<!-- AI-generated branch summary (2026-03-22) -->
# Branch: `transfer-consensus` → merge target: `master`
# Branch: `expose-lapjv` → merge target: `master`

This branch exposes the LAPJV (Jonker–Volgenant) linear assignment solver
via C++ headers so that downstream packages can link to it (`LinkingTo`),
and implements a transfer consensus method (`TransferConsensus()`,
`TransferDist()`). It also tightens error handling by moving all
`Rcpp::stop()` / `Rf_error()` calls out of C++ implementation code
into R-level input validation.

# TreeDist 2.13.0.9000

This branch implements transfer consensus trees (`TransferConsensus()`) and
a corresponding transfer distance metric (`TransferDist()`), providing a
Expand Down
4 changes: 4 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,10 @@ cpp_transfer_dist_cross_pairs <- function(splits_a, splits_b, n_tip, scale, n_th
.Call(`_TreeDist_cpp_transfer_dist_cross_pairs`, splits_a, splits_b, n_tip, scale, n_threads)
}

cpp_mci_impl_score <- function(x, y, n_tips) {
.Call(`_TreeDist_cpp_mci_impl_score`, x, y, n_tips)
}

cpp_robinson_foulds_distance <- function(x, y, nTip) {
.Call(`_TreeDist_cpp_robinson_foulds_distance`, x, y, nTip)
}
Expand Down
276 changes: 276 additions & 0 deletions inst/include/TreeDist/cost_matrix.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,276 @@
#ifndef TREEDIST_COST_MATRIX_H_
#define TREEDIST_COST_MATRIX_H_

// TreeDist CostMatrix — cache-aligned, block-padded cost matrix for the LAP.
//
// Rcpp-free public API. The Rcpp-dependent constructor
// (CostMatrix(Rcpp::NumericMatrix)) lives in TreeDist's own src/lap.h.

#include "types.h"

#include <algorithm>
#include <cassert>
#include <tuple>
#include <vector>

namespace TreeDist {

class CostMatrix {
private:
std::size_t dim_;
std::size_t dim8_;
alignas(64) std::vector<cost> data_;
alignas(64) std::vector<cost> t_data_;
bool transposed_;

static std::size_t block_containing(const std::size_t x) {
return ((x + BLOCK_SIZE - 1) / BLOCK_SIZE) * BLOCK_SIZE;
}

public:
// Default constructor for pooled instances (zero-size, no allocation).
CostMatrix() : dim_(0), dim8_(0), transposed_(false) {}

explicit CostMatrix(std::size_t dim)
: dim_(dim),
dim8_(block_containing(dim_)),
data_(std::vector<cost>(dim8_ * dim8_)),
t_data_(std::vector<cost>(dim8_ * dim8_)),
transposed_(false) {}

// Resize for reuse. Only reallocates when the new dimension exceeds the
// current buffer capacity; otherwise just updates dim_/dim8_ and marks
// the transpose as stale.
void resize(std::size_t new_dim) {
dim_ = new_dim;
dim8_ = block_containing(new_dim);
const std::size_t needed = dim8_ * dim8_;
if (data_.size() < needed) {
data_.resize(needed);
t_data_.resize(needed);
}
transposed_ = false;
}

void reset() noexcept { transposed_ = false; }

[[nodiscard]] std::size_t dim() const noexcept { return dim_; }
[[nodiscard]] std::size_t dim8() const noexcept { return dim8_; }

// ---- Element access ----

cost& operator()(lap_row i, lap_col j) {
return data_[static_cast<std::size_t>(i) * dim8_ + j];
}

[[nodiscard]] const cost& operator()(lap_row i, lap_col j) const {
return data_[static_cast<std::size_t>(i) * dim8_ + j];
}

[[nodiscard]] const cost& row0(lap_col j) const {
return data_[j];
}

[[nodiscard]] const cost& entry0(lap_row i) const {
return data_[static_cast<std::size_t>(i) * dim8_];
}

[[nodiscard]] cost* row(lap_row i) {
return &data_[static_cast<std::size_t>(i) * dim8_];
}

[[nodiscard]] const cost* row(lap_row i) const {
return &data_[static_cast<std::size_t>(i) * dim8_];
}

[[nodiscard]] cost* col(lap_col i) {
return &t_data_[static_cast<std::size_t>(i) * dim8_];
}

[[nodiscard]] const cost* col(lap_col i) const {
return &t_data_[static_cast<std::size_t>(i) * dim8_];
}

// Write a value to both the data and transposed-data arrays.
// After filling the entire matrix this way, call markTransposed()
// to skip the lazy transpose in makeTranspose().
void setWithTranspose(lap_row i, lap_col j, cost value) {
data_[static_cast<std::size_t>(i) * dim8_ + j] = value;
t_data_[static_cast<std::size_t>(j) * dim8_ + i] = value;
}

void markTransposed() noexcept { transposed_ = true; }

// ---- Transpose ----

void makeTranspose() noexcept {
if (transposed_) return;

const cost* __restrict__ data_ptr = data_.data();
cost* __restrict__ t_data_ptr = t_data_.data();

#if defined(__GNUC__) || defined(__clang__)
data_ptr =
static_cast<const cost*>(__builtin_assume_aligned(data_ptr, 64));
t_data_ptr =
static_cast<cost*>(__builtin_assume_aligned(t_data_ptr, 64));
#endif
for (std::size_t i = 0; i < dim_; i += BLOCK_SIZE) {
for (std::size_t j = 0; j < dim_; j += BLOCK_SIZE) {
for (std::size_t r = i;
r < std::min(i + BLOCK_SIZE, dim_); ++r) {
for (std::size_t c = j;
c < std::min(j + BLOCK_SIZE, dim_); ++c) {
t_data_ptr[c * dim8_ + r] = data_ptr[r * dim8_ + c];
}
}
}
}
transposed_ = true;
}

void makeUntranspose() noexcept {
const cost* __restrict__ data_ptr = t_data_.data();
cost* __restrict__ t_data_ptr = data_.data();

#if defined(__GNUC__) || defined(__clang__)
data_ptr =
static_cast<const cost*>(__builtin_assume_aligned(data_ptr, 64));
t_data_ptr =
static_cast<cost*>(__builtin_assume_aligned(t_data_ptr, 64));
#endif
for (std::size_t i = 0; i < dim_; i += BLOCK_SIZE) {
for (std::size_t j = 0; j < dim_; j += BLOCK_SIZE) {
for (std::size_t r = i;
r < std::min(i + BLOCK_SIZE, dim_); ++r) {
for (std::size_t c = j;
c < std::min(j + BLOCK_SIZE, dim_); ++c) {
t_data_ptr[c * dim8_ + r] = data_ptr[r * dim8_ + c];
}
}
}
}
}

// ---- Padding ----

void padTrAfterCol(lap_row start_row, cost value) {
std::size_t start_index = static_cast<std::size_t>(start_row) * dim8_;
std::size_t end_index = dim_ * dim8_;
std::fill(t_data_.begin() + start_index,
t_data_.begin() + end_index, value);
}

void padAfterRow(lap_row start_row, cost value) {
std::size_t start_index = static_cast<std::size_t>(start_row) * dim8_;
std::size_t end_index = dim_ * dim8_;
std::fill(data_.begin() + start_index,
data_.begin() + end_index, value);
}

void padTrColAfterRow(const lap_row r, const lap_col start_col,
const cost value) {
std::size_t r_offset = r * dim8_;
std::size_t actual_start_col = static_cast<std::size_t>(start_col);
std::size_t start_index = r_offset + actual_start_col;
std::size_t end_index = start_index + dim_ - actual_start_col;
std::fill(t_data_.begin() + start_index,
t_data_.begin() + end_index, value);
}

void padRowAfterCol(const lap_row r, const lap_col start_col,
const cost value) {
std::size_t r_offset = r * dim8_;
std::size_t actual_start_col = static_cast<std::size_t>(start_col);
std::size_t start_index = r_offset + actual_start_col;
std::size_t end_index = start_index + dim_ - actual_start_col;
std::fill(data_.begin() + start_index,
data_.begin() + end_index, value);
}

// ---- Search ----

std::pair<cost, lap_row> findColMin(lap_col j,
lap_row search_dim = -1) {
makeTranspose();
const cost* col_data = col(j);
const lap_row n =
(search_dim < 0) ? static_cast<lap_row>(dim_) : search_dim;
const auto min_ptr = std::min_element(col_data, col_data + n);
return {*min_ptr,
static_cast<lap_row>(std::distance(col_data, min_ptr))};
}

std::tuple<cost, cost, lap_col, lap_col> findRowSubmin(
const lap_row* i, const std::vector<cost>& v
) const {
assert(dim_ > 1);

const cost* __restrict__ row_i = row(*i);
const lap_col dim = static_cast<lap_col>(dim_);
const cost* __restrict__ v_ptr = v.data();

const cost h0 = row_i[0] - v_ptr[0];
const cost h1 = row_i[1] - v_ptr[1];

cost min_val, submin_val;
lap_col min_idx, submin_idx;

if (h0 > h1) {
min_val = h1; submin_val = h0;
min_idx = 1; submin_idx = 0;
} else {
min_val = h0; submin_val = h1;
min_idx = 0; submin_idx = 1;
}

const lap_col j_limit =
(dim < 4 ? 0 : static_cast<lap_col>(dim - 3));

for (lap_col j = 2; j < j_limit; j += 4) {
assert(BLOCK_SIZE >= 4);
const cost h0 = row_i[j] - v_ptr[j];
const cost h1 = row_i[j + 1] - v_ptr[j + 1];
const cost h2 = row_i[j + 2] - v_ptr[j + 2];
const cost h3 = row_i[j + 3] - v_ptr[j + 3];
if (h0 < submin_val) {
if (h0 < min_val) {
submin_val = min_val; min_val = h0;
submin_idx = min_idx; min_idx = j;
} else { submin_val = h0; submin_idx = j; }
}
if (h1 < submin_val) {
if (h1 < min_val) {
submin_val = min_val; min_val = h1;
submin_idx = min_idx; min_idx = j + 1;
} else { submin_val = h1; submin_idx = j + 1; }
}
if (h2 < submin_val) {
if (h2 < min_val) {
submin_val = min_val; min_val = h2;
submin_idx = min_idx; min_idx = j + 2;
} else { submin_val = h2; submin_idx = j + 2; }
}
if (h3 < submin_val) {
if (h3 < min_val) {
submin_val = min_val; min_val = h3;
submin_idx = min_idx; min_idx = j + 3;
} else { submin_val = h3; submin_idx = j + 3; }
}
}
for (lap_col j = j_limit; j < dim; ++j) {
const cost h = row_i[j] - v_ptr[j];
if (h < submin_val) {
if (h < min_val) {
submin_val = min_val; min_val = h;
submin_idx = min_idx; min_idx = j;
} else { submin_val = h; submin_idx = j; }
}
}
return {min_val, submin_val, min_idx, submin_idx};
}
};

} // namespace TreeDist

#endif // TREEDIST_COST_MATRIX_H_
Loading
Loading