Skip to content

fix(sos): SOS constraints on masked variables (#688)#692

Draft
FBumann wants to merge 6 commits into
refactor/sos-reformulation-methodsfrom
fix/sos-masked-direct-api
Draft

fix(sos): SOS constraints on masked variables (#688)#692
FBumann wants to merge 6 commits into
refactor/sos-reformulation-methodsfrom
fix/sos-masked-direct-api

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 18, 2026

Stacked on #690. Fixes #688.

Closes the SOS-on-masked-variables bug at its root rather than guarding against it. The defensive _check_sos_unmasked workaround introduced by #689 is removed; the real fix covers both the direct-API path (Gurobi, Xpress) and the LP file writer.

Root cause

Two related failure modes, both rooted in conflating linopy labels with solver-side positions / names:

  1. Direct API: gm.addSOS(...) / problem.addSOS(...) were given linopy labels as if they were 0-based column positions in the active-variable array. Labels equal positions only when no variable in the model is masked anywhere.
  2. LP writer: sos_to_file emitted variable names directly from the raw label array. A masked entry (label = -1) became x-1, which LP parsers reject or (gurobi LP reader) silently corrupt. The set identifier labels.isel(sos_dim=0) was -1 whenever the first slot was masked.

Fix

  • Direct API (Gurobi, Xpress): shared _sos_set_positions(labels, weights, label_to_pos) helper drops masked entries and resolves linopy labels to active-variable positions via model.variables.label_index.label_to_pos. Both direct builders route through it; empty sets are skipped.
  • LP writer (sos_to_file): polars pipeline gets a labels != -1 filter before group_by; per-set id is the max of non-masked labels (always a valid label; fully-masked sets get id -1 and are filtered out). Keeps polars vectorization.

The two paths stay separate by design — LP polars vs direct numpy is the right idiom split for each path's throughput needs; unification would force LP to Python-level iteration with no maintenance win for a ~10-line shared concept.

Coverage

  • test/test_sos_masked.py (new) — 11 fixture configs (1D/2D × all mask placements × with/without unrelated-var mask) × 2 SOS types × every SOS-capable solver × {direct, lp} io_apis ≈ 110 cases. Three-layer oracle per test: status == "ok" + objective == expected + element-wise solution == expected. Asymmetric objective coefficients break permutation symmetry; analytical optimizer in the fixture provides exact oracles using list-position adjacency for SOS2.
  • Companion sos2 piecewise NaN-padding test converted to a positive assertion (same y_b = 12.5 oracle as the lp variant).
  • Existing reformulation-with-masked test moved to test_sos_reformulation.py, reworded — reformulation isn't a "workaround" anymore.

TDD discipline

5 commits, each fix preceded by its failing test:

  1. test(sos): regression matrix for masked SOS variables — drops defensive guard so tests surface the real bug. Without fix: 28 LP-path masked cases fail with Name 'x-1' does not exist.
  2. fix(sos): resolve linopy labels to solver column positions in direct builds — direct-path passes.
  3. fix(sos): filter masked SOS members + robust set id in LP writer — LP-path passes.
  4. test(pwl): convert masked-sos2 NaN-padding test to positive assertion.
  5. test(sos): move masked-reformulation test to test_sos_reformulation.

Test plan

  • pytest test/test_sos_masked.py — 66 active cases pass, 44 skipped.
  • Full suite passes.
  • ruff check, ruff format, mypy linopy/{model,solvers,io}.py clean.

🤖 Generated with Claude Code

FBumann and others added 5 commits May 18, 2026 21:57
Add test/test_sos_masked.py covering SOS-with-masked-variables across:

- 1D and 2D SOS variables
- All mask placements (none, sos_dim, non_sos_dim, both_dims) plus an
  optional mask on an unrelated variable (the angle that exposes the
  label-vs-position bug elsewhere in the model)
- SOS1 and SOS2
- All SOS-capable solvers × {direct, lp} io_apis

Asymmetric objective coefficients [-1,-2,-3,-4] along the SOS dim break
permutation symmetry. The fixture's analytical optimizer (matching
list-position adjacency for SOS2) supplies an exact (objective,
per-slot solution) oracle so wrong indexing surfaces as a visible
divergence, not a permutation-equivalent silent pass.

Three-layer oracle per test:
  1. status == "ok" (catches OOB raises and LP parser rejections)
  2. objective.value == expected (catches silent-wrong-answer when the
     bug constrains the wrong columns)
  3. element-wise solution.values == expected (catches the rare
     permutation-equivalent case where objective happens to match)

Also drop the defensive Model._check_sos_unmasked workaround from
master (#689) along with its call sites in Solver._build and to_file.
With this commit alone the tests run and surface the real bug;
subsequent commits in this PR apply the actual fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…builds

Both Gurobi and Xpress direct API builds were passing linopy variable
labels straight to vendor `addSOS` as if they were 0-based column
positions in the solver's variable array. Labels equal positions only
when no variable in the model is masked anywhere — with any mask, the
SOS members get attached to the wrong (or out-of-range) columns.

Introduce a shared `_sos_set_positions(labels, weights, label_to_pos)`
helper that drops masked entries (label = -1) along with their weights
and maps the survivors to active-variable positions via
`model.variables.label_index.label_to_pos`. Both `Gurobi._build_solver_model`
and `Xpress._build_solver_model` route through it.

Empty SOS sets (all members masked, e.g. when a non-SOS dim mask
removes an entire set) are now skipped instead of producing an empty
vendor call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LP writer was emitting variable names from the raw label array. For
a masked SOS member (label = -1) this produced `x-1`, which LP parsers
either reject outright (cplex, xpress, mosek) or silently corrupt into
a wrong SOS set (gurobi LP reader). The set identifier was computed as
`labels.isel(sos_dim=0)`, which is -1 whenever the first slot of the
set is masked — both an invalid LP identifier and one that collides
across distinct sets that all happen to have a masked first slot.

Both are fixed in `sos_to_file`:

- Per-set id is now the `max` of non-masked labels along the SOS dim,
  which is always a valid label (any of the set's surviving members
  would do; max is the simplest stable choice). Fully-masked sets get
  id -1 and are filtered out before emission.
- Masked member rows are dropped from the polars dataframe before the
  `group_by` aggregation, so masked members never reach the file.

The fix preserves the LP path's polars vectorization (no switch to
Python-level iteration) and stays separate from the direct-API helper
— the two paths have identical-shape fixes implemented in each path's
natural idiom.

Also drop the now-obsolete `test_direct_api_raises_on_masked_sos` /
`test_lp_writer_raises_on_masked_sos` tests in `test_sos_constraints`:
they were guarding the defensive `_check_sos_unmasked` workaround that
this PR replaces with the real fix. The same coverage (and much more)
is now in `test_sos_masked.py`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The companion sos2 test was guarding the defensive _check_sos_unmasked
workaround — asserting that piecewise-with-sos2 NaN-padded breakpoints
raised NotImplementedError. With the real #688 fix landed, masked SOS
lambdas now flow through both direct-API and LP-writer paths
correctly, so the sos2 piecewise solve produces the same answer as the
lp variant (y_b = 12.5 on the chord (5,10)→(15,15)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "reformulation + masked SOS" integration test lived in
test_sos_constraints alongside SOS-validation tests but its concern is
the reformulation pipeline (the apply/undo path on a masked input),
not the SOS constraint itself. Move it to TestSolveWithReformulation
in test_sos_reformulation.py where the other reformulation tests live.

Also drop the "documented workaround" framing in the docstring — with
#688 fixed, reformulation isn't a workaround for masked SOS anymore,
just one of two ways to solve such models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FBumann FBumann force-pushed the fix/sos-masked-direct-api branch from 03d4d85 to 70c51ef Compare May 18, 2026 20:16
The SOS handling in Gurobi._build_solver_model and
Xpress._build_solver_model was ~22 lines each, ~95% identical: same
iteration over model.variables.sos, same 1D vs multi-dim stack/groupby
branching, same masked-entry filter, same label→position resolution.
Only the vendor addSOS call differed.

Extract _iter_sos_sets(model) yielding (sos_type, positions, weights)
per active SOS set. Each subclass collapses to a 2-line loop with its
vendor call. The shared bug-prone parts (multi-dim groupby, mask
filtering, label resolution) live in one place; a future SOS-capable
direct-API solver (#683, OETC) plugs in with one addSOS line.

Also replace `# type: ignore[assignment]` on sos_type/sos_dim with
runtime int()/str() casts. The casts both document the actual runtime
types and guard against future changes to what gets stored in
var.attrs, with no measurable cost. The remaining type: ignore is on
the int() call itself (mypy is overly conservative about xarray's
Hashable-typed attrs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant