Skip to content

Fix issue with infinite lower bounds and try to bound free variables in barrier#1001

Open
chris-maes wants to merge 5 commits intoNVIDIA:release/26.04from
chris-maes:barrier_presolve_fixes
Open

Fix issue with infinite lower bounds and try to bound free variables in barrier#1001
chris-maes wants to merge 5 commits intoNVIDIA:release/26.04from
chris-maes:barrier_presolve_fixes

Conversation

@chris-maes
Copy link
Copy Markdown
Contributor

  • Fixes a long-standing issue where variables with bounds of the form -inf <= x_j < u_j were not handled correctly by barrier
  • We now try to bound free variables (-inf <= x_j <= inf) in barrier presolve. This is helpful as PSLP can sometimes introduce many free variables.

@chris-maes chris-maes requested a review from a team as a code owner March 30, 2026 23:55
@chris-maes chris-maes requested review from aliceb-nv and mlubin March 30, 2026 23:55
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@chris-maes chris-maes added bug Something isn't working P0 labels Mar 30, 2026
@chris-maes chris-maes added this to the 26.04 milestone Mar 30, 2026
@chris-maes chris-maes changed the base branch from main to release/26.04 March 30, 2026 23:56
@chris-maes chris-maes added the non-breaking Introduces a non-breaking change label Mar 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a barrier presolve pass that tightens/removes certain free variables and negates variables with only an upper bound (x' = -x), updates objective/A/Q accordingly, records negated indices and undoes them in solution uncrushing, introduces CLI file-read timing/logging, and adjusts barrier residual/objective logging and normalization.

Changes

Cohort / File(s) Summary
Presolve declarations
cpp/src/dual_simplex/presolve.hpp
Added presolve_info_t::negated_variables to record indices of variables negated during presolve.
Presolve implementation & solution mapping
cpp/src/dual_simplex/presolve.cpp
Added initial free-variable counting and a constraint-based tightening pass that can remove certain free variables; implemented negation substitution for variables with only an upper bound (x' = -x) by flipping problem.lower/upper, negating problem.objective entries, multiplying affected problem.A column coefficients by -1, updating quadratic problem.Q.x terms when exactly one column is negated, removed a duplicate free-variable check, and updated uncrush_solution to undo negations for recorded indices.
CLI timing/logging
cpp/cuopt_cli.cpp
Included utilities/timer.hpp, instantiated a timer in run_single_file, initialized logger earlier, and logged MPS file read time via the configured logger.
Barrier solver diagnostics
cpp/src/barrier/barrier.cu
Switched complementarity residual norm computation to use GPU residual vectors, logged lp.obj_constant, normalized complementarity residuals using compute_user_objective(lp, primal_objective), and updated per-iteration logging to print user objectives and relative residuals while keeping existing convergence checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main changes: fixing infinite lower bounds and bounding free variables in barrier presolve.
Description check ✅ Passed The description relates to the changeset by explaining the two primary fixes: handling -inf bounds and bounding free variables in barrier presolve.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 926-944: Guard the divisions by a_ij in the bounding logic by
skipping or avoiding division when |a_ij| is near zero: introduce a small
tolerance (e.g. tol) and before any 1.0 / a_ij operations check if
std::abs(a_ij) > tol; if not, do not update problem.upper[j] or problem.lower[j]
(and do not set bounded) or handle via a safe fallback; apply this check around
the four locations using the same tol and keep the existing conditions involving
lower_inf_i, upper_inf_i, lower_activity_i, upper_activity_i, rhs and bounded so
only numerically safe divisions occur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de8c026d-5d75-460d-a9a8-fb066f9cc6f1

📥 Commits

Reviewing files that changed from the base of the PR and between d03e197 and b86145c.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/dual_simplex/presolve.hpp

Comment on lines +926 to +944
if (a_ij > 0) {
if (lower_inf_i == 1) {
problem.upper[j] = 1.0 / a_ij * (rhs - lower_activity_i);
bounded = true;
}
if (upper_inf_i == 1) {
problem.lower[j] = 1.0 / a_ij * (rhs - upper_activity_i);
bounded = true;
}
} else if (a_ij < 0) {
if (lower_inf_i == 1) {
problem.lower[j] = 1.0 / a_ij * (rhs - lower_activity_i);
bounded = true;
}
if (upper_inf_i == 1) {
problem.upper[j] = 1.0 / a_ij * (rhs - upper_activity_i);
bounded = true;
}
}
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against near-zero coefficients before division.

The divisions by a_ij at lines 928, 932, 937, and 941 lack a tolerance check. A near-zero coefficient can produce numerically unstable or excessively large bounds, degrading solver performance or causing downstream issues.

🛡️ Proposed fix to add tolerance check
+        constexpr f_t coeff_tol = 1e-10;
+        if (std::abs(a_ij) < coeff_tol) { continue; }
+
         if (a_ij > 0) {
           if (lower_inf_i == 1) {
             problem.upper[j] = 1.0 / a_ij * (rhs - lower_activity_i);

Based on learnings: "Guard numerical stability by guarding near-zero norms in cut-related computations. ... functions ... should also handle near-zero norms to avoid instability."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/presolve.cpp` around lines 926 - 944, Guard the
divisions by a_ij in the bounding logic by skipping or avoiding division when
|a_ij| is near zero: introduce a small tolerance (e.g. tol) and before any 1.0 /
a_ij operations check if std::abs(a_ij) > tol; if not, do not update
problem.upper[j] or problem.lower[j] (and do not set bounded) or handle via a
safe fallback; apply this check around the four locations using the same tol and
keep the existing conditions involving lower_inf_i, upper_inf_i,
lower_activity_i, upper_activity_i, rhs and bounded so only numerically safe
divisions occur.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tough. A large bound is ok here, because we are trying to put bounds on variables that are currently infinite. I guess if the bound is so large it might be better to keep it infinite for numerical purposes. But what is considered too large? 1e20? 1e15? 1e16?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-maes, this is a great engineering question. Here's my analysis of the tradeoffs:

Checking the resulting bound vs. checking the coefficient

The two approaches are complementary:

  1. Coefficient tolerance (|a_ij| < tol): Simple, but may be overly conservative — if both a_ij and (rhs - activity) are tiny but in proportion, the computed bound might still be well-scaled.

  2. Bound magnitude threshold (|computed_bound| > BIG_M): More directly addresses the concern — if the derived bound is astronomically large, it's useless to the barrier method and potentially harmful.

For a barrier (interior-point) method the relevant consideration is: using coefficients that span many orders of magnitude is a primary source of numerical challenge in LP solvers. A double-precision floating-point number can only represent 15 precise digits, so round-off errors become inevitable when values span 16 or more orders of magnitude.

Practical threshold

The purpose of this presolve step is to give the barrier a useful finite bound. If the computed bound is, say, 1e20, the logarithmic barrier term -log(u - x) is completely negligible for any x of order 1 — so the variable is still effectively unconstrained from the solver's perspective, providing no benefit.

A practical choice, consistent with how commercial solvers treat "practical infinity":

  • 1e20 is a widely used "big-M" sentinel in optimization. It is large enough that a well-scaled problem would essentially never hit it, but small enough to avoid the worst floating-point pathologies.
  • 1e15 is tighter — closer to the double-precision representational limit (~10^15.6) — and would keep the barrier terms meaningful.

Recommended approach

Rather than (or in addition to) the coefficient check, you could guard with:

constexpr f_t big_bound = 1e20;

if (a_ij > 0) {
  if (lower_inf_i == 1) {
    const f_t new_upper = 1.0 / a_ij * (rhs - lower_activity_i);
    if (std::abs(new_upper) < big_bound) {
      problem.upper[j] = new_upper;
      bounded = true;
    }
  }
  if (upper_inf_i == 1) {
    const f_t new_lower = 1.0 / a_ij * (rhs - upper_activity_i);
    if (std::abs(new_lower) < big_bound) {
      problem.lower[j] = new_lower;
      bounded = true;
    }
  }
} else if (a_ij < 0) {
  if (lower_inf_i == 1) {
    const f_t new_lower = 1.0 / a_ij * (rhs - lower_activity_i);
    if (std::abs(new_lower) < big_bound) {
      problem.lower[j] = new_lower;
      bounded = true;
    }
  }
  if (upper_inf_i == 1) {
    const f_t new_upper = 1.0 / a_ij * (rhs - upper_activity_i);
    if (std::abs(new_upper) < big_bound) {
      problem.upper[j] = new_upper;
      bounded = true;
    }
  }
}

This way you preserve the semantics of "keep the bound infinite if the derived bound is too large to be useful," which is exactly the intuition you described. The 1e20 value is a reasonable starting point but can be tuned based on the typical scaling of your problems. If you want tighter numerical safety, 1e15 is the more conservative choice given double-precision limits.


🧠 Learnings used
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 599
File: cpp/src/dual_simplex/cuts.cpp:79-99
Timestamp: 2026-01-22T22:55:36.280Z
Learning: Guard numerical stability by guarding near-zero norms in cut-related computations. Specifically, ensure that while cuts with zero norm are prevented from entering the cut pool by existing validation in cpp/src/dual_simplex/cuts.cpp, functions like cut_orthogonality (and related routines in the same directory) should also handle near-zero norms to avoid instability. This guideline applies to all cpp files under cpp/src/dual_simplex (e.g., the dual simplex cuts module) to prevent numerical issues during cut generation and orthogonality calculations.

Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 814
File: cpp/src/dual_simplex/branch_and_bound.hpp:154-159
Timestamp: 2026-01-29T04:24:57.790Z
Learning: In files under cpp/src/dual_simplex, document and enforce that mutex_original_lp_ protects original_lp_ specifically during the root cut pass phase. The public API set_new_solution() may be called from external threads during this phase while the main thread adds cuts/slacks to original_lp_. Functions like best_first_thread, exploration_ramp_up, and plunge_from execute after the cut passes complete (after the OpenMP barrier) when original_lp_ is fixed, so reads do not require mutex protection. Apply this guidance to all relevant files (hpp and cpp) in the dual_simplex directory and ensure code reviews verify correct mutex usage timing and safe access patterns around original_lp_.

Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 814
File: cpp/src/dual_simplex/cuts.cpp:1278-1278
Timestamp: 2026-01-30T23:34:22.711Z
Learning: In the cuOpt dual simplex solver (cpp/src/dual_simplex/ and related files), slack variable coefficients in the constraint matrix are exactly 1.0 or -1.0. Therefore, exact equality checks like assert(std::abs(lp.A.x[col_start]) == 1.0) are appropriate and should not be flagged as requiring epsilon tolerances. Apply this guidance to all files under cpp/src/dual_simplex that involve slack coefficient validation.

Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 986
File: cpp/src/branch_and_bound/branch_and_bound.cpp:8-8
Timestamp: 2026-03-23T11:33:23.998Z
Learning: In this repo (NVIDIA/cuopt), treat nvcc as the supported CUDA toolchain; clang-based compilation/support is not required and may fail/break. During code reviews, do NOT request code changes or add blocking comments for errors that appear only under clang (e.g., header-resolution failures such as 'utilities/determinism_log.hpp not found')—these can be toolchain-related rather than real source issues.


folding_info_t<i_t, f_t> folding_info;

std::vector<i_t> new_slacks;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. This line should be removed.

@chris-maes
Copy link
Copy Markdown
Contributor Author

@rg20 to review


// The constraints are in the form:
// sum_j a_j x_j = beta
for (i_t i : constraints_to_check) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can repurpose the bound strengthening code here.

@rg20
Copy link
Copy Markdown
Contributor

rg20 commented Apr 2, 2026

/ok to test 4432c27

…ig. Use user object for relative complementarity gap. Initialize logger so we get feedback when reading large mps files
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 3493-3497: The complementarity normalization uses
compute_user_objective() which includes lp.obj_constant; remove that constant
from the denominator so an arbitrary objective offset cannot shrink the
relative_complementarity_residual. Change the normalization to use the user
objective without lp.obj_constant (either call a new helper like
compute_user_objective_without_constant or subtract lp.obj_constant from
compute_user_objective(lp, primal_objective) before taking std::abs) when
computing relative_complementarity_residual (and the other occurrence noted),
leaving compute_user_objective() behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 44e4dc72-db04-4343-ab6d-560f43805934

📥 Commits

Reviewing files that changed from the base of the PR and between 4432c27 and d8054d1.

📒 Files selected for processing (3)
  • cpp/cuopt_cli.cpp
  • cpp/src/barrier/barrier.cu
  • cpp/src/dual_simplex/presolve.cpp
✅ Files skipped from review due to trivial changes (1)
  • cpp/src/dual_simplex/presolve.cpp

Comment on lines 3493 to +3497
f_t relative_primal_residual = primal_residual_norm / (1.0 + norm_b);
f_t relative_dual_residual = dual_residual_norm / (1.0 + norm_c);
f_t relative_complementarity_residual =
complementarity_residual_norm / (1.0 + std::abs(primal_objective));
complementarity_residual_norm /
(1.0 + std::abs(compute_user_objective(lp, primal_objective)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep the complementarity normalization out of compute_user_objective().

compute_user_objective() in cpp/src/dual_simplex/solve.cpp:101-105 adds lp.obj_constant, but that constant does not change the KKT system. A large objective offset can therefore drive relative_complementarity_residual arbitrarily low and make the optimal/suboptimal exit checks accept an iterate with a still-large complementarity gap.

🛠️ Suggested fix
-    f_t relative_complementarity_residual =
-      complementarity_residual_norm /
-      (1.0 + std::abs(compute_user_objective(lp, primal_objective)));
+    const f_t complementarity_scale = 1.0 + std::abs(primal_objective);
+    f_t relative_complementarity_residual =
+      complementarity_residual_norm / complementarity_scale;
...
-      relative_complementarity_residual =
-        complementarity_residual_norm /
-        (1.0 + std::abs(compute_user_objective(lp, primal_objective)));
+      const f_t complementarity_scale = 1.0 + std::abs(primal_objective);
+      relative_complementarity_residual =
+        complementarity_residual_norm / complementarity_scale;

As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results".

Also applies to: 3658-3660

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/barrier.cu` around lines 3493 - 3497, The complementarity
normalization uses compute_user_objective() which includes lp.obj_constant;
remove that constant from the denominator so an arbitrary objective offset
cannot shrink the relative_complementarity_residual. Change the normalization to
use the user objective without lp.obj_constant (either call a new helper like
compute_user_objective_without_constant or subtract lp.obj_constant from
compute_user_objective(lp, primal_objective) before taking std::abs) when
computing relative_complementarity_residual (and the other occurrence noted),
leaving compute_user_objective() behavior unchanged.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cpp/src/dual_simplex/presolve.cpp (1)

861-904: Verify iterative bound propagation handles partial bounds correctly.

The activity computation uses current bounds, which may have been modified by previous constraint iterations. This implements iterative implied bound propagation. However, if a variable receives only one bound (e.g., only upper set at line 931, not lower), it transitions from counting as infinite to contributing finite activity for subsequent constraints.

This is mathematically correct behavior for bound propagation, but consider whether a single-pass approach might miss tightening opportunities compared to iterating until fixpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/presolve.cpp` around lines 861 - 904, The current
single-pass loop over constraints_to_check (inside the block using Arow,
problem.lower/problem.upper and variables like lower_activity_i,
upper_activity_i, lower_inf_i, upper_inf_i, last_free_i) can miss tightening
opportunities because bounds updated earlier in the pass change contribution
status for later rows; change the algorithm to iterate until a fixpoint: repeat
scanning the relevant constraints (or use a worklist/queue of affected
constraints) and recompute lower_activity_i/upper_activity_i for each row
whenever a variable bound changes, marking any tightened bound and re-enqueueing
its incident constraints, and stop when no bound is tightened in a full pass so
all implied bounds are propagated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 861-904: The current single-pass loop over constraints_to_check
(inside the block using Arow, problem.lower/problem.upper and variables like
lower_activity_i, upper_activity_i, lower_inf_i, upper_inf_i, last_free_i) can
miss tightening opportunities because bounds updated earlier in the pass change
contribution status for later rows; change the algorithm to iterate until a
fixpoint: repeat scanning the relevant constraints (or use a worklist/queue of
affected constraints) and recompute lower_activity_i/upper_activity_i for each
row whenever a variable bound changes, marking any tightened bound and
re-enqueueing its incident constraints, and stop when no bound is tightened in a
full pass so all implied bounds are propagated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d604ec33-375d-4234-9125-cefe3d8439a7

📥 Commits

Reviewing files that changed from the base of the PR and between d8054d1 and c7816fa.

📒 Files selected for processing (1)
  • cpp/src/dual_simplex/presolve.cpp

@anandhkb
Copy link
Copy Markdown
Contributor

anandhkb commented Apr 3, 2026

/ok to test c7816fa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants