Race batch PDLP and dual simplex in strong branching / reliability branching by Kh4ster · Pull Request #994 · NVIDIA/cuopt

Kh4ster · 2026-03-25T17:26:51Z

This PR enables the following regarding batch PDLP:

Enable batch PDLP in reliability branching
Add work stealing so that batch PDLP and Dual Simplex can run concurrently and steal LPs from each other if one solves it first
Use correct problem representation with cuts for batch PDLP
Use a PDLP warm start cache across strong branching at the root and in reliability branching
Increase tolerance on batch PDLP to have higher quality solution
Increase iteration limit to allow instances that needs a high iteration count (with low cost per iteration) to still come through (only while solving the original LP to get warm start data)
Multiple heuristics to not run batch PDLP to not create overheads when Dual Simplex is clearly superior
Don't store and copy primal dual solution unless need it to save on memory
Handle batch PDLP errors better, allowing Dual Simplex to still continue in strong branching even if BPDLP fails
No early exit if the initial warm start PDLP solution is already feasible in BPDLP
Correct objective for BPDLP when there is an offset

Currently we still keep BPDLP off by default both at the root and in reliability branching

…ng_branching

… buffer

…+ fail safe if batch pdlp fails

…without cuts

…te_lp_problem' into race_strong_branching_realibility_branching

…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error

Kh4ster · 2026-03-30T11:40:40Z

/ok to test 496c4fd

coderabbitai

♻️ Duplicate comments (1)

cpp/src/branch_and_bound/pseudo_costs.cpp (1)

158-172: ⚠️ Potential issue | 🟡 Minor

Minor: Typo in comment.

Line 160 has "iteartion" instead of "iteration".

-      // We could not mark as solved nodes hitting iteartion limit in DS
+      // We could not mark as solved nodes hitting iteration limit in DS

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 158 - 172, Fix the
simple typo in the comment inside the block guarded by sb_view.is_valid() in
pseudo_costs.cpp: change "iteartion" to "iteration" in the comment that reads
"We could not mark as solved nodes hitting iteartion limit in DS"; the
surrounding code uses sb_view.mark_solved(shared_idx) and
ds_is_valid_done(ds_status_down[k]) / ds_is_valid_done(ds_status_up[k]) so
update only the comment text to "iteration" to correct the spelling.

🧹 Nitpick comments (3)

cpp/src/branch_and_bound/pseudo_costs.cpp (3)

502-505: Redundant assert before identical if-check.

The assert on line 502 checks the same condition as the if-check on line 505. Since the code inside the if-block is the only code path anyway, the assert is redundant.

-    assert(!pc.pdlp_warm_cache.populated &&
-           "PDLP warm cache should not be populated at this point");
-
-    if (!pc.pdlp_warm_cache.populated) {
+    if (!pc.pdlp_warm_cache.populated) {
+      // Expected: cache is not populated on first entry

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 502 - 505, Remove the
redundant assert that checks pc.pdlp_warm_cache.populated immediately before the
identical if-check; specifically delete the assert(!pc.pdlp_warm_cache.populated
&& "PDLP warm cache should not be populated at this point") so only the existing
if (!pc.pdlp_warm_cache.populated) { ... } remains (leave the if-block and its
logic unchanged).

1283-1301: Address TODO: Remove or restore commented-out logging code.

This block of commented-out logging code should either be restored or removed before merging. If the statistics are useful, consider re-enabling them; otherwise, clean up the dead code.

Would you like me to help restore this logging block with the current variable names, or should it be removed entirely?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1283 - 1301, Restore
the commented-out RB Dual Simplex logging: re-enable the conditional block
around rb_mode (use rb_mode != 2 check, with the inner rb_mode == 1 branch) and
call log.printf with the current variables ds_elapsed (from toc(ds_start_time)),
num_candidates, ds_optimal.load(), ds_infeasible.load(), ds_failed.load(), and
ds_skipped.load() exactly as in the diff; ensure the two format strings match
their branches (one including ds_skipped and “(PDLP)”, the other without it) and
that values use num_candidates * 2 for the denominators as shown, keeping
correct punctuation and line breaks so the code compiles.

366-373: Consider using epsilon comparison for floating-point sigma values.

Exact floating-point comparison (sigma == -1, sigma == 1) can fail if sigma has any numerical noise from matrix operations. While slack coefficients are typically constructed as exact +1/-1, an epsilon-based comparison would be more robust.

-    if (sigma == -1) {
+    constexpr f_t eps = 1e-9;
+    if (std::abs(sigma - (-1.0)) < eps) {
       constraint_lower[i] = slack_lower + lp.rhs[i];
       constraint_upper[i] = slack_upper + lp.rhs[i];
-    } else if (sigma == 1) {
+    } else if (std::abs(sigma - 1.0) < eps) {
       constraint_lower[i] = -slack_upper + lp.rhs[i];
       constraint_upper[i] = -slack_lower + lp.rhs[i];
     } else {
-      assert(sigma == 1.0 || sigma == -1.0);
+      assert(false && "sigma must be +1 or -1");
     }

As per coding guidelines: "use epsilon comparisons for floating-point equality checks"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 366 - 373, The code
compares floating-point sigma to exact integers; replace direct comparisons in
the block handling sigma with epsilon-based checks (e.g., fabs(sigma - 1.0) <
EPS and fabs(sigma + 1.0) < EPS) when assigning constraint_lower[i] and
constraint_upper[i] using slack_lower/slack_upper and lp.rhs[i]; define or reuse
a small EPS constant and update the final assert to assert(fabs(sigma - 1.0) <
EPS || fabs(sigma + 1.0) < EPS) so the logic in this function (pseudo_costs.cpp
handling sigma, constraint_lower, constraint_upper, slack_lower, slack_upper) is
robust to floating-point noise.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 158-172: Fix the simple typo in the comment inside the block
guarded by sb_view.is_valid() in pseudo_costs.cpp: change "iteartion" to
"iteration" in the comment that reads "We could not mark as solved nodes hitting
iteartion limit in DS"; the surrounding code uses
sb_view.mark_solved(shared_idx) and ds_is_valid_done(ds_status_down[k]) /
ds_is_valid_done(ds_status_up[k]) so update only the comment text to "iteration"
to correct the spelling.

---

Nitpick comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 502-505: Remove the redundant assert that checks
pc.pdlp_warm_cache.populated immediately before the identical if-check;
specifically delete the assert(!pc.pdlp_warm_cache.populated && "PDLP warm cache
should not be populated at this point") so only the existing if
(!pc.pdlp_warm_cache.populated) { ... } remains (leave the if-block and its
logic unchanged).
- Around line 1283-1301: Restore the commented-out RB Dual Simplex logging:
re-enable the conditional block around rb_mode (use rb_mode != 2 check, with the
inner rb_mode == 1 branch) and call log.printf with the current variables
ds_elapsed (from toc(ds_start_time)), num_candidates, ds_optimal.load(),
ds_infeasible.load(), ds_failed.load(), and ds_skipped.load() exactly as in the
diff; ensure the two format strings match their branches (one including
ds_skipped and “(PDLP)”, the other without it) and that values use
num_candidates * 2 for the denominators as shown, keeping correct punctuation
and line breaks so the code compiles.
- Around line 366-373: The code compares floating-point sigma to exact integers;
replace direct comparisons in the block handling sigma with epsilon-based checks
(e.g., fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS) when assigning
constraint_lower[i] and constraint_upper[i] using slack_lower/slack_upper and
lp.rhs[i]; define or reuse a small EPS constant and update the final assert to
assert(fabs(sigma - 1.0) < EPS || fabs(sigma + 1.0) < EPS) so the logic in this
function (pseudo_costs.cpp handling sigma, constraint_lower, constraint_upper,
slack_lower, slack_upper) is robust to floating-point noise.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9d7be61-6bc3-4400-aa36-b21d1943fd39

📥 Commits

Reviewing files that changed from the base of the PR and between f456c71 and 962c2ea.

📒 Files selected for processing (1)

cpp/src/branch_and_bound/pseudo_costs.cpp

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 49-53: The bounds check in subview can overflow when adding signed
i_t values; replace the single assert(offset + count <= solved.size()) with a
safe check that prevents overflow by validating non-negativity (if i_t is
signed) and comparing sizes using the same unsigned type as solved.size():
ensure offset >= 0 and count >= 0 (if applicable), then cast to the span-size
type (e.g. std::size_t) and assert(static_cast<size_type>(offset) <=
solved.size() && static_cast<size_type>(count) <= solved.size() -
static_cast<size_type>(offset)); also use those casts when calling
solved.subspan(offset, count) so subview uses consistent unsigned indices.
- Around line 37-47: The bounds checks in is_solved and mark_solved are unsafe
if i_t is signed because a negative local_idx will convert to a large unsigned
when compared to solved.size() and assert may be disabled in release; update
both methods (is_solved and mark_solved) to first check that local_idx is
non-negative and then compare static_cast<size_t>(local_idx) against
solved.size(), and handle failures with a runtime-safe response (e.g., throw
std::out_of_range or return/abort) instead of relying solely on assert so access
to solved[local_idx] is always safe.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4f7a71dc-8774-4f65-90e4-9dd5c98b0e0a

📥 Commits

Reviewing files that changed from the base of the PR and between 962c2ea and 496c4fd.

📒 Files selected for processing (1)

cpp/src/branch_and_bound/shared_strong_branching_context.hpp

cpp/src/branch_and_bound/shared_strong_branching_context.hpp

nguidotti

Thanks for the hard work, Nicolas!

My main concerns are:

Do not create a std::thread at each call to the reliability branching. Instead, use omp task to integrate with the rest of B&B, share the CPU resources more efficiently and avoid oversubscribing the CPU.
Since two or more threads might call the reliability branching at the same time, some variables may appear in multiple unreliable lists. Yet, a single call to strong branching can be enough to turn the variable into reliable. If this is the case, then we can propagate the score from one thread to another rather than calling strong branching again. That is why there is a mutex per variable that locks when performing a trial branching. When using batch PDLP, we also need to consider this case and mark the variable accordingly. Multiple calls to strong branching is not necessarily a bad thing (more information about a variable can lead to better decisions). However, it may be too expensive.
Considering that there are many simultaneous calls to batch PDLP, are the kernels inserted in the same CUDA stream or each call goes to a separated stream? The latter may be better for performance, but might mess up with the determinism.

cpp/src/branch_and_bound/pseudo_costs.cpp

nguidotti · 2026-03-30T11:01:10Z

cpp/src/branch_and_bound/pseudo_costs.cpp

 template <typename i_t, typename f_t>
-void strong_branching(const user_problem_t<i_t, f_t>& original_problem,
-                      const lp_problem_t<i_t, f_t>& original_lp,
+static std::pair<f_t, i_t> merge_sb_result(f_t ds_val,


Nitpick: use an enum instead of a int to set the source. Avoid setting an incorrect value in the future and the intent of the return is more clear

nguidotti · 2026-03-30T11:02:33Z

cpp/src/branch_and_bound/pseudo_costs.cpp

+                      const std::vector<i_t>& new_slacks,
                      const std::vector<variable_type_t>& var_types,
-                      const std::vector<f_t> root_soln,
+                      const std::vector<f_t>& root_soln,


Note that in #979, I am setting the root_soln to lp_solution such that x, y and z are already available.

Yes if your PR gets merged first I will use it instead

nguidotti · 2026-03-30T11:04:08Z

cpp/src/branch_and_bound/pseudo_costs.cpp

                          const std::vector<f_t>& edge_norms,
-                          pseudo_costs_t<i_t, f_t>& pc)
+                          pseudo_costs_t<i_t, f_t>& pc,
+                          std::vector<f_t>& ds_obj_down,


I would avoid calling dual simplex as ds. We do not use this terminology on the other parts of the code base

Similarly, strong_branching vs sb or reliability branching vs rb.

cpp/src/branch_and_bound/pseudo_costs.cpp

nguidotti · 2026-03-30T11:19:09Z

cpp/src/branch_and_bound/pseudo_costs.cpp

+  } else if (rb_mode != 0 && unreliable_list.size() < min_num_candidates_for_pdlp) {
+    log.printf("Not enough candidates to use batch PDLP, using DS only\n");
+  } else if (rb_mode != 0 && pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root < 5.0) {
+    log.printf("Pourcent solved by batch PDLP at root is too low, using DS only\n");


Typo: Pourcent --> Percent

You need to check for other places as well

cpp/src/branch_and_bound/pseudo_costs.cpp

…ather than thread, put both bpdlp call in functions

…ble, use one stream per BPDLP in RB

…rtion is given

Kh4ster · 2026-03-30T17:10:12Z

/ok to test 6e5baa5

…n path that doesn't support cpp20

…ranching

Kh4ster · 2026-03-31T07:27:40Z

/ok to test 86455ef

…ranching

Kh4ster · 2026-03-31T09:20:17Z

/ok to test 2f05352

nguidotti

In the strong branching, we need to modify where the parallel region starts to the pdlp task to be properly created. I will send you a patch

I would advise to avoid abbreviating the names (ds for dual simplex, or sb for strong branching). We are not using in the remaining of the code and it can be confusing later.

cpp/src/branch_and_bound/pseudo_costs.cpp

Kh4ster · 2026-03-31T11:48:54Z

/ok to test c888495

chris-maes · 2026-03-31T15:38:15Z

@tmckayus and @chris-maes to review

chris-maes · 2026-03-31T15:40:00Z

python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py

+        "1 = cooperative work-stealing (DS + batch PDLP), "
+        "2 = batch PDLP only.",
+    )
+    mip_batch_pdlp_reliability_branching: Optional[int] = Field(


Sounds like we aren't adding new parameters to the service.

@tmckayus to check if someone sends the new parameter it is ok.

…RB DS task for valid merge update

Kh4ster · 2026-03-31T16:28:21Z

/ok to test 95addf5

…ranching

Kh4ster · 2026-04-01T07:48:20Z

/ok to test 7ea4a52

Kh4ster and others added 30 commits February 9, 2026 17:47

implement racing for strong branching

70b251d

race on by default

6848df1

tmp

fc7aa04

fix

1614bc1

fix

4f33531

workaround for thrust reverse iterator build error

e0a530e

remove compile file

e330718

fix

dce6d4f

final cleanup

9c03faf

addtional cleanup

6c2fe35

Merge branch 'main' into fix_batch_pdlp

eb60c98

address PR comments, add tests, update doc

a43dc0c

format

c8b8b74

format

b1be5bb

Merge branch 'main' into fix_batch_pdlp

fe31e7e

style

d89af96

put back changes in restart

c7e3e22

Merge remote-tracking branch 'upstream/fix_batch_pdlp' into race_stro…

793345c

…ng_branching

fix use overall time limit, reduce memory consumtion and add a bigger…

73a52b1

… buffer

switch to double for memory estimator as size_t was hitting overflow …

dbc94fd

…+ fail safe if batch pdlp fails

add support for dual simplex warm start

8b1ec93

Merge branch 'main' into race_strong_branching

f97d817

handle batch pdlp being out of memory not has teramintion error

71e47eb

add a basic batch pdlp race strategy in strong branching

d025441

Merge branch 'main' into race_strong_branching_realibility_branching

b43d7f3

fix compilation issue

3044887

separate the two batch pdlp settings

0108de4

Fix bug where batch PDLP for strong branching was running on problem …

721a56a

…without cuts

Merge remote-tracking branch 'chris/strong_branching_batch_lp_transla…

13ae8b4

…te_lp_problem' into race_strong_branching_realibility_branching

pass slack and correct problem convertion also in reliabiltiy branchi…

ba1e4bd

…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error

fix: add size assertion to shared strong branching context

496c4fd

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

cpp/src/branch_and_bound/shared_strong_branching_context.hpp Show resolved Hide resolved

cpp/src/branch_and_bound/shared_strong_branching_context.hpp Show resolved Hide resolved

nguidotti requested changes Mar 30, 2026

View reviewed changes

Kh4ster added 6 commits March 30, 2026 14:34

cleanup names

9ec3f40

multiples fixes: use span only in solver, use tasks to launch bpdlp r…

16e4e5f

…ather than thread, put both bpdlp call in functions

two improvements: mark variables as solved in DS if node became relia…

843c532

…ble, use one stream per BPDLP in RB

fix: avoid early exit if solved at step 0 even when initial pdlp itea…

a9fd420

…rtion is given

disable both by default

edae299

styl

6e5baa5

Kh4ster added 2 commits March 31, 2026 09:25

switch from std::span to cuda::std::span to support python compilatio…

aaecdcd

…n path that doesn't support cpp20

Merge branch 'release/26.04' into race_strong_branching_realibility_b…

86455ef

…ranching

Kh4ster added 2 commits March 31, 2026 11:19

Merge branch 'release/26.04' into race_strong_branching_realibility_b…

a8256d5

…ranching

Merge branch 'release/26.04' into race_strong_branching_realibility_b…

2f05352

…ranching

nguidotti reviewed Mar 31, 2026

View reviewed changes

cpp/src/branch_and_bound/pseudo_costs.cpp Outdated Show resolved Hide resolved

fix task parallelism

c888495

Kh4ster modified the milestones: 26.06, 26.04 Mar 31, 2026

chris-maes added the P1 label Mar 31, 2026

chris-maes reviewed Mar 31, 2026

View reviewed changes

fix bpdlp objective to take into account offset, pass shared data in …

95addf5

…RB DS task for valid merge update

Merge branch 'release/26.04' into race_strong_branching_realibility_b…

7ea4a52

…ranching

Conversation

Kh4ster commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kh4ster commented Mar 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nguidotti left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kh4ster commented Mar 30, 2026

Uh oh!

Kh4ster commented Mar 31, 2026

Uh oh!

Kh4ster commented Mar 31, 2026

Uh oh!

nguidotti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kh4ster commented Mar 31, 2026

Uh oh!

chris-maes commented Mar 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kh4ster commented Mar 31, 2026

Uh oh!

Kh4ster commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kh4ster commented Mar 25, 2026 •

edited

Loading

nguidotti left a comment •

edited

Loading