Skip to content

Race batch PDLP and dual simplex in strong branching / reliability branching#994

Open
Kh4ster wants to merge 59 commits intorelease/26.04from
race_strong_branching_realibility_branching
Open

Race batch PDLP and dual simplex in strong branching / reliability branching#994
Kh4ster wants to merge 59 commits intorelease/26.04from
race_strong_branching_realibility_branching

Conversation

@Kh4ster
Copy link
Copy Markdown
Contributor

@Kh4ster Kh4ster commented Mar 25, 2026

This PR enables the following regarding batch PDLP:

  • Enable batch PDLP in reliability branching
  • Add work stealing so that batch PDLP and Dual Simplex can run concurrently and steal LPs from each other if one solves it first
  • Use correct problem representation with cuts for batch PDLP
  • Use a PDLP warm start cache across strong branching at the root and in reliability branching
  • Increase tolerance on batch PDLP to have higher quality solution
  • Increase iteration limit to allow instances that needs a high iteration count (with low cost per iteration) to still come through (only while solving the original LP to get warm start data)
  • Multiple heuristics to not run batch PDLP to not create overheads when Dual Simplex is clearly superior
  • Don't store and copy primal dual solution unless need it to save on memory
  • Handle batch PDLP errors better, allowing Dual Simplex to still continue in strong branching even if BPDLP fails
  • No early exit if the initial warm start PDLP solution is already feasible in BPDLP
  • Correct objective for BPDLP when there is an offset

Currently we still keep BPDLP off by default both at the root and in reliability branching

Kh4ster and others added 30 commits February 9, 2026 17:47
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 30, 2026

/ok to test 496c4fd

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)

158-172: ⚠️ Potential issue | 🟡 Minor

Minor: Typo in comment.

Line 160 has "iteartion" instead of "iteration".

-      // We could not mark as solved nodes hitting iteartion limit in DS
+      // We could not mark as solved nodes hitting iteration limit in DS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 158 - 172, Fix the
simple typo in the comment inside the block guarded by sb_view.is_valid() in
pseudo_costs.cpp: change "iteartion" to "iteration" in the comment that reads
"We could not mark as solved nodes hitting iteartion limit in DS"; the
surrounding code uses sb_view.mark_solved(shared_idx) and
ds_is_valid_done(ds_status_down[k]) / ds_is_valid_done(ds_status_up[k]) so
update only the comment text to "iteration" to correct the spelling.
🧹 Nitpick comments (3)
cpp/src/branch_and_bound/pseudo_costs.cpp (3)

502-505: Redundant assert before identical if-check.

The assert on line 502 checks the same condition as the if-check on line 505. Since the code inside the if-block is the only code path anyway, the assert is redundant.

-    assert(!pc.pdlp_warm_cache.populated &&
-           "PDLP warm cache should not be populated at this point");
-
-    if (!pc.pdlp_warm_cache.populated) {
+    if (!pc.pdlp_warm_cache.populated) {
+      // Expected: cache is not populated on first entry
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 502 - 505, Remove the
redundant assert that checks pc.pdlp_warm_cache.populated immediately before the
identical if-check; specifically delete the assert(!pc.pdlp_warm_cache.populated
&& "PDLP warm cache should not be populated at this point") so only the existing
if (!pc.pdlp_warm_cache.populated) { ... } remains (leave the if-block and its
logic unchanged).

1283-1301: Address TODO: Remove or restore commented-out logging code.

This block of commented-out logging code should either be restored or removed before merging. If the statistics are useful, consider re-enabling them; otherwise, clean up the dead code.

Would you like me to help restore this logging block with the current variable names, or should it be removed entirely?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1283 - 1301, Restore
the commented-out RB Dual Simplex logging: re-enable the conditional block
around rb_mode (use rb_mode != 2 check, with the inner rb_mode == 1 branch) and
call log.printf with the current variables ds_elapsed (from toc(ds_start_time)),
num_candidates, ds_optimal.load(), ds_infeasible.load(), ds_failed.load(), and
ds_skipped.load() exactly as in the diff; ensure the two format strings match
their branches (one including ds_skipped and “(PDLP)”, the other without it) and
that values use num_candidates * 2 for the denominators as shown, keeping
correct punctuation and line breaks so the code compiles.

366-373: Consider using epsilon comparison for floating-point sigma values.

Exact floating-point comparison (sigma == -1, sigma == 1) can fail if sigma has any numerical noise from matrix operations. While slack coefficients are typically constructed as exact +1/-1, an epsilon-based comparison would be more robust.

-    if (sigma == -1) {
+    constexpr f_t eps = 1e-9;
+    if (std::abs(sigma - (-1.0)) < eps) {
       constraint_lower[i] = slack_lower + lp.rhs[i];
       constraint_upper[i] = slack_upper + lp.rhs[i];
-    } else if (sigma == 1) {
+    } else if (std::abs(sigma - 1.0) < eps) {
       constraint_lower[i] = -slack_upper + lp.rhs[i];
       constraint_upper[i] = -slack_lower + lp.rhs[i];
     } else {
-      assert(sigma == 1.0 || sigma == -1.0);
+      assert(false && "sigma must be +1 or -1");
     }

As per coding guidelines: "use epsilon comparisons for floating-point equality checks"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 366 - 373, The code
compares floating-point sigma to exact integers; replace direct comparisons in
the block handling sigma with epsilon-based checks (e.g., fabs(sigma - 1.0) <
EPS and fabs(sigma + 1.0) < EPS) when assigning constraint_lower[i] and
constraint_upper[i] using slack_lower/slack_upper and lp.rhs[i]; define or reuse
a small EPS constant and update the final assert to assert(fabs(sigma - 1.0) <
EPS || fabs(sigma + 1.0) < EPS) so the logic in this function (pseudo_costs.cpp
handling sigma, constraint_lower, constraint_upper, slack_lower, slack_upper) is
robust to floating-point noise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 158-172: Fix the simple typo in the comment inside the block
guarded by sb_view.is_valid() in pseudo_costs.cpp: change "iteartion" to
"iteration" in the comment that reads "We could not mark as solved nodes hitting
iteartion limit in DS"; the surrounding code uses
sb_view.mark_solved(shared_idx) and ds_is_valid_done(ds_status_down[k]) /
ds_is_valid_done(ds_status_up[k]) so update only the comment text to "iteration"
to correct the spelling.

---

Nitpick comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 502-505: Remove the redundant assert that checks
pc.pdlp_warm_cache.populated immediately before the identical if-check;
specifically delete the assert(!pc.pdlp_warm_cache.populated && "PDLP warm cache
should not be populated at this point") so only the existing if
(!pc.pdlp_warm_cache.populated) { ... } remains (leave the if-block and its
logic unchanged).
- Around line 1283-1301: Restore the commented-out RB Dual Simplex logging:
re-enable the conditional block around rb_mode (use rb_mode != 2 check, with the
inner rb_mode == 1 branch) and call log.printf with the current variables
ds_elapsed (from toc(ds_start_time)), num_candidates, ds_optimal.load(),
ds_infeasible.load(), ds_failed.load(), and ds_skipped.load() exactly as in the
diff; ensure the two format strings match their branches (one including
ds_skipped and “(PDLP)”, the other without it) and that values use
num_candidates * 2 for the denominators as shown, keeping correct punctuation
and line breaks so the code compiles.
- Around line 366-373: The code compares floating-point sigma to exact integers;
replace direct comparisons in the block handling sigma with epsilon-based checks
(e.g., fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS) when assigning
constraint_lower[i] and constraint_upper[i] using slack_lower/slack_upper and
lp.rhs[i]; define or reuse a small EPS constant and update the final assert to
assert(fabs(sigma - 1.0) < EPS || fabs(sigma + 1.0) < EPS) so the logic in this
function (pseudo_costs.cpp handling sigma, constraint_lower, constraint_upper,
slack_lower, slack_upper) is robust to floating-point noise.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9d7be61-6bc3-4400-aa36-b21d1943fd39

📥 Commits

Reviewing files that changed from the base of the PR and between f456c71 and 962c2ea.

📒 Files selected for processing (1)
  • cpp/src/branch_and_bound/pseudo_costs.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 49-53: The bounds check in subview can overflow when adding signed
i_t values; replace the single assert(offset + count <= solved.size()) with a
safe check that prevents overflow by validating non-negativity (if i_t is
signed) and comparing sizes using the same unsigned type as solved.size():
ensure offset >= 0 and count >= 0 (if applicable), then cast to the span-size
type (e.g. std::size_t) and assert(static_cast<size_type>(offset) <=
solved.size() && static_cast<size_type>(count) <= solved.size() -
static_cast<size_type>(offset)); also use those casts when calling
solved.subspan(offset, count) so subview uses consistent unsigned indices.
- Around line 37-47: The bounds checks in is_solved and mark_solved are unsafe
if i_t is signed because a negative local_idx will convert to a large unsigned
when compared to solved.size() and assert may be disabled in release; update
both methods (is_solved and mark_solved) to first check that local_idx is
non-negative and then compare static_cast<size_t>(local_idx) against
solved.size(), and handle failures with a runtime-safe response (e.g., throw
std::out_of_range or return/abort) instead of relying solely on assert so access
to solved[local_idx] is always safe.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4f7a71dc-8774-4f65-90e4-9dd5c98b0e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 962c2ea and 496c4fd.

📒 Files selected for processing (1)
  • cpp/src/branch_and_bound/shared_strong_branching_context.hpp

Copy link
Copy Markdown
Contributor

@nguidotti nguidotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work, Nicolas!

My main concerns are:

  1. Do not create a std::thread at each call to the reliability branching. Instead, use omp task to integrate with the rest of B&B, share the CPU resources more efficiently and avoid oversubscribing the CPU.

  2. Since two or more threads might call the reliability branching at the same time, some variables may appear in multiple unreliable lists. Yet, a single call to strong branching can be enough to turn the variable into reliable. If this is the case, then we can propagate the score from one thread to another rather than calling strong branching again. That is why there is a mutex per variable that locks when performing a trial branching. When using batch PDLP, we also need to consider this case and mark the variable accordingly. Multiple calls to strong branching is not necessarily a bad thing (more information about a variable can lead to better decisions). However, it may be too expensive.

  3. Considering that there are many simultaneous calls to batch PDLP, are the kernels inserted in the same CUDA stream or each call goes to a separated stream? The latter may be better for performance, but might mess up with the determinism.

template <typename i_t, typename f_t>
void strong_branching(const user_problem_t<i_t, f_t>& original_problem,
const lp_problem_t<i_t, f_t>& original_lp,
static std::pair<f_t, i_t> merge_sb_result(f_t ds_val,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: use an enum instead of a int to set the source. Avoid setting an incorrect value in the future and the intent of the return is more clear

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

const std::vector<i_t>& new_slacks,
const std::vector<variable_type_t>& var_types,
const std::vector<f_t> root_soln,
const std::vector<f_t>& root_soln,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in #979, I am setting the root_soln to lp_solution such that x, y and z are already available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if your PR gets merged first I will use it instead

const std::vector<f_t>& edge_norms,
pseudo_costs_t<i_t, f_t>& pc)
pseudo_costs_t<i_t, f_t>& pc,
std::vector<f_t>& ds_obj_down,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid calling dual simplex as ds. We do not use this terminology on the other parts of the code base

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, strong_branching vs sb or reliability branching vs rb.

} else if (rb_mode != 0 && unreliable_list.size() < min_num_candidates_for_pdlp) {
log.printf("Not enough candidates to use batch PDLP, using DS only\n");
} else if (rb_mode != 0 && pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root < 5.0) {
log.printf("Pourcent solved by batch PDLP at root is too low, using DS only\n");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Pourcent --> Percent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check for other places as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 30, 2026

/ok to test 6e5baa5

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 31, 2026

/ok to test 86455ef

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 31, 2026

/ok to test 2f05352

Copy link
Copy Markdown
Contributor

@nguidotti nguidotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the strong branching, we need to modify where the parallel region starts to the pdlp task to be properly created. I will send you a patch

I would advise to avoid abbreviating the names (ds for dual simplex, or sb for strong branching). We are not using in the remaining of the code and it can be confusing later.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 31, 2026

/ok to test c888495

@Kh4ster Kh4ster modified the milestones: 26.06, 26.04 Mar 31, 2026
@chris-maes chris-maes added the P1 label Mar 31, 2026
@chris-maes
Copy link
Copy Markdown
Contributor

@tmckayus and @chris-maes to review

"1 = cooperative work-stealing (DS + batch PDLP), "
"2 = batch PDLP only.",
)
mip_batch_pdlp_reliability_branching: Optional[int] = Field(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we aren't adding new parameters to the service.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus to check if someone sends the new parameter it is ok.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 31, 2026

/ok to test 95addf5

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 1, 2026

/ok to test 7ea4a52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change P1 pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants