Race batch PDLP and dual simplex in strong branching / reliability branching#994
Race batch PDLP and dual simplex in strong branching / reliability branching#994Kh4ster wants to merge 59 commits intorelease/26.04from
Conversation
…+ fail safe if batch pdlp fails
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
|
/ok to test 496c4fd |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
158-172:⚠️ Potential issue | 🟡 MinorMinor: Typo in comment.
Line 160 has "iteartion" instead of "iteration".
- // We could not mark as solved nodes hitting iteartion limit in DS + // We could not mark as solved nodes hitting iteration limit in DS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 158 - 172, Fix the simple typo in the comment inside the block guarded by sb_view.is_valid() in pseudo_costs.cpp: change "iteartion" to "iteration" in the comment that reads "We could not mark as solved nodes hitting iteartion limit in DS"; the surrounding code uses sb_view.mark_solved(shared_idx) and ds_is_valid_done(ds_status_down[k]) / ds_is_valid_done(ds_status_up[k]) so update only the comment text to "iteration" to correct the spelling.
🧹 Nitpick comments (3)
cpp/src/branch_and_bound/pseudo_costs.cpp (3)
502-505: Redundant assert before identical if-check.The assert on line 502 checks the same condition as the if-check on line 505. Since the code inside the if-block is the only code path anyway, the assert is redundant.
- assert(!pc.pdlp_warm_cache.populated && - "PDLP warm cache should not be populated at this point"); - - if (!pc.pdlp_warm_cache.populated) { + if (!pc.pdlp_warm_cache.populated) { + // Expected: cache is not populated on first entry🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 502 - 505, Remove the redundant assert that checks pc.pdlp_warm_cache.populated immediately before the identical if-check; specifically delete the assert(!pc.pdlp_warm_cache.populated && "PDLP warm cache should not be populated at this point") so only the existing if (!pc.pdlp_warm_cache.populated) { ... } remains (leave the if-block and its logic unchanged).
1283-1301: Address TODO: Remove or restore commented-out logging code.This block of commented-out logging code should either be restored or removed before merging. If the statistics are useful, consider re-enabling them; otherwise, clean up the dead code.
Would you like me to help restore this logging block with the current variable names, or should it be removed entirely?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1283 - 1301, Restore the commented-out RB Dual Simplex logging: re-enable the conditional block around rb_mode (use rb_mode != 2 check, with the inner rb_mode == 1 branch) and call log.printf with the current variables ds_elapsed (from toc(ds_start_time)), num_candidates, ds_optimal.load(), ds_infeasible.load(), ds_failed.load(), and ds_skipped.load() exactly as in the diff; ensure the two format strings match their branches (one including ds_skipped and “(PDLP)”, the other without it) and that values use num_candidates * 2 for the denominators as shown, keeping correct punctuation and line breaks so the code compiles.
366-373: Consider using epsilon comparison for floating-point sigma values.Exact floating-point comparison (
sigma == -1,sigma == 1) can fail ifsigmahas any numerical noise from matrix operations. While slack coefficients are typically constructed as exact +1/-1, an epsilon-based comparison would be more robust.- if (sigma == -1) { + constexpr f_t eps = 1e-9; + if (std::abs(sigma - (-1.0)) < eps) { constraint_lower[i] = slack_lower + lp.rhs[i]; constraint_upper[i] = slack_upper + lp.rhs[i]; - } else if (sigma == 1) { + } else if (std::abs(sigma - 1.0) < eps) { constraint_lower[i] = -slack_upper + lp.rhs[i]; constraint_upper[i] = -slack_lower + lp.rhs[i]; } else { - assert(sigma == 1.0 || sigma == -1.0); + assert(false && "sigma must be +1 or -1"); }As per coding guidelines: "use epsilon comparisons for floating-point equality checks"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 366 - 373, The code compares floating-point sigma to exact integers; replace direct comparisons in the block handling sigma with epsilon-based checks (e.g., fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS) when assigning constraint_lower[i] and constraint_upper[i] using slack_lower/slack_upper and lp.rhs[i]; define or reuse a small EPS constant and update the final assert to assert(fabs(sigma - 1.0) < EPS || fabs(sigma + 1.0) < EPS) so the logic in this function (pseudo_costs.cpp handling sigma, constraint_lower, constraint_upper, slack_lower, slack_upper) is robust to floating-point noise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 158-172: Fix the simple typo in the comment inside the block
guarded by sb_view.is_valid() in pseudo_costs.cpp: change "iteartion" to
"iteration" in the comment that reads "We could not mark as solved nodes hitting
iteartion limit in DS"; the surrounding code uses
sb_view.mark_solved(shared_idx) and ds_is_valid_done(ds_status_down[k]) /
ds_is_valid_done(ds_status_up[k]) so update only the comment text to "iteration"
to correct the spelling.
---
Nitpick comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 502-505: Remove the redundant assert that checks
pc.pdlp_warm_cache.populated immediately before the identical if-check;
specifically delete the assert(!pc.pdlp_warm_cache.populated && "PDLP warm cache
should not be populated at this point") so only the existing if
(!pc.pdlp_warm_cache.populated) { ... } remains (leave the if-block and its
logic unchanged).
- Around line 1283-1301: Restore the commented-out RB Dual Simplex logging:
re-enable the conditional block around rb_mode (use rb_mode != 2 check, with the
inner rb_mode == 1 branch) and call log.printf with the current variables
ds_elapsed (from toc(ds_start_time)), num_candidates, ds_optimal.load(),
ds_infeasible.load(), ds_failed.load(), and ds_skipped.load() exactly as in the
diff; ensure the two format strings match their branches (one including
ds_skipped and “(PDLP)”, the other without it) and that values use
num_candidates * 2 for the denominators as shown, keeping correct punctuation
and line breaks so the code compiles.
- Around line 366-373: The code compares floating-point sigma to exact integers;
replace direct comparisons in the block handling sigma with epsilon-based checks
(e.g., fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS) when assigning
constraint_lower[i] and constraint_upper[i] using slack_lower/slack_upper and
lp.rhs[i]; define or reuse a small EPS constant and update the final assert to
assert(fabs(sigma - 1.0) < EPS || fabs(sigma + 1.0) < EPS) so the logic in this
function (pseudo_costs.cpp handling sigma, constraint_lower, constraint_upper,
slack_lower, slack_upper) is robust to floating-point noise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c9d7be61-6bc3-4400-aa36-b21d1943fd39
📒 Files selected for processing (1)
cpp/src/branch_and_bound/pseudo_costs.cpp
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 49-53: The bounds check in subview can overflow when adding signed
i_t values; replace the single assert(offset + count <= solved.size()) with a
safe check that prevents overflow by validating non-negativity (if i_t is
signed) and comparing sizes using the same unsigned type as solved.size():
ensure offset >= 0 and count >= 0 (if applicable), then cast to the span-size
type (e.g. std::size_t) and assert(static_cast<size_type>(offset) <=
solved.size() && static_cast<size_type>(count) <= solved.size() -
static_cast<size_type>(offset)); also use those casts when calling
solved.subspan(offset, count) so subview uses consistent unsigned indices.
- Around line 37-47: The bounds checks in is_solved and mark_solved are unsafe
if i_t is signed because a negative local_idx will convert to a large unsigned
when compared to solved.size() and assert may be disabled in release; update
both methods (is_solved and mark_solved) to first check that local_idx is
non-negative and then compare static_cast<size_t>(local_idx) against
solved.size(), and handle failures with a runtime-safe response (e.g., throw
std::out_of_range or return/abort) instead of relying solely on assert so access
to solved[local_idx] is always safe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4f7a71dc-8774-4f65-90e4-9dd5c98b0e0a
📒 Files selected for processing (1)
cpp/src/branch_and_bound/shared_strong_branching_context.hpp
There was a problem hiding this comment.
Thanks for the hard work, Nicolas!
My main concerns are:
-
Do not create a
std::threadat each call to the reliability branching. Instead, useomp taskto integrate with the rest of B&B, share the CPU resources more efficiently and avoid oversubscribing the CPU. -
Since two or more threads might call the reliability branching at the same time, some variables may appear in multiple unreliable lists. Yet, a single call to strong branching can be enough to turn the variable into reliable. If this is the case, then we can propagate the score from one thread to another rather than calling strong branching again. That is why there is a mutex per variable that locks when performing a trial branching. When using batch PDLP, we also need to consider this case and mark the variable accordingly. Multiple calls to strong branching is not necessarily a bad thing (more information about a variable can lead to better decisions). However, it may be too expensive.
-
Considering that there are many simultaneous calls to batch PDLP, are the kernels inserted in the same CUDA stream or each call goes to a separated stream? The latter may be better for performance, but might mess up with the determinism.
| template <typename i_t, typename f_t> | ||
| void strong_branching(const user_problem_t<i_t, f_t>& original_problem, | ||
| const lp_problem_t<i_t, f_t>& original_lp, | ||
| static std::pair<f_t, i_t> merge_sb_result(f_t ds_val, |
There was a problem hiding this comment.
Nitpick: use an enum instead of a int to set the source. Avoid setting an incorrect value in the future and the intent of the return is more clear
| const std::vector<i_t>& new_slacks, | ||
| const std::vector<variable_type_t>& var_types, | ||
| const std::vector<f_t> root_soln, | ||
| const std::vector<f_t>& root_soln, |
There was a problem hiding this comment.
Note that in #979, I am setting the root_soln to lp_solution such that x, y and z are already available.
There was a problem hiding this comment.
Yes if your PR gets merged first I will use it instead
| const std::vector<f_t>& edge_norms, | ||
| pseudo_costs_t<i_t, f_t>& pc) | ||
| pseudo_costs_t<i_t, f_t>& pc, | ||
| std::vector<f_t>& ds_obj_down, |
There was a problem hiding this comment.
I would avoid calling dual simplex as ds. We do not use this terminology on the other parts of the code base
There was a problem hiding this comment.
Similarly, strong_branching vs sb or reliability branching vs rb.
| } else if (rb_mode != 0 && unreliable_list.size() < min_num_candidates_for_pdlp) { | ||
| log.printf("Not enough candidates to use batch PDLP, using DS only\n"); | ||
| } else if (rb_mode != 0 && pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root < 5.0) { | ||
| log.printf("Pourcent solved by batch PDLP at root is too low, using DS only\n"); |
There was a problem hiding this comment.
Typo: Pourcent --> Percent
There was a problem hiding this comment.
You need to check for other places as well
…ather than thread, put both bpdlp call in functions
…ble, use one stream per BPDLP in RB
|
/ok to test 6e5baa5 |
…n path that doesn't support cpp20
|
/ok to test 86455ef |
|
/ok to test 2f05352 |
nguidotti
left a comment
There was a problem hiding this comment.
In the strong branching, we need to modify where the parallel region starts to the pdlp task to be properly created. I will send you a patch
I would advise to avoid abbreviating the names (ds for dual simplex, or sb for strong branching). We are not using in the remaining of the code and it can be confusing later.
|
/ok to test c888495 |
|
@tmckayus and @chris-maes to review |
| "1 = cooperative work-stealing (DS + batch PDLP), " | ||
| "2 = batch PDLP only.", | ||
| ) | ||
| mip_batch_pdlp_reliability_branching: Optional[int] = Field( |
There was a problem hiding this comment.
Sounds like we aren't adding new parameters to the service.
There was a problem hiding this comment.
@tmckayus to check if someone sends the new parameter it is ok.
…RB DS task for valid merge update
|
/ok to test 95addf5 |
|
/ok to test 7ea4a52 |
This PR enables the following regarding batch PDLP:
Currently we still keep BPDLP off by default both at the root and in reliability branching