fix(sos): SOS constraints on masked variables (#688)#692
Draft
FBumann wants to merge 6 commits into
Draft
Conversation
Add test/test_sos_masked.py covering SOS-with-masked-variables across:
- 1D and 2D SOS variables
- All mask placements (none, sos_dim, non_sos_dim, both_dims) plus an
optional mask on an unrelated variable (the angle that exposes the
label-vs-position bug elsewhere in the model)
- SOS1 and SOS2
- All SOS-capable solvers × {direct, lp} io_apis
Asymmetric objective coefficients [-1,-2,-3,-4] along the SOS dim break
permutation symmetry. The fixture's analytical optimizer (matching
list-position adjacency for SOS2) supplies an exact (objective,
per-slot solution) oracle so wrong indexing surfaces as a visible
divergence, not a permutation-equivalent silent pass.
Three-layer oracle per test:
1. status == "ok" (catches OOB raises and LP parser rejections)
2. objective.value == expected (catches silent-wrong-answer when the
bug constrains the wrong columns)
3. element-wise solution.values == expected (catches the rare
permutation-equivalent case where objective happens to match)
Also drop the defensive Model._check_sos_unmasked workaround from
master (#689) along with its call sites in Solver._build and to_file.
With this commit alone the tests run and surface the real bug;
subsequent commits in this PR apply the actual fixes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…builds Both Gurobi and Xpress direct API builds were passing linopy variable labels straight to vendor `addSOS` as if they were 0-based column positions in the solver's variable array. Labels equal positions only when no variable in the model is masked anywhere — with any mask, the SOS members get attached to the wrong (or out-of-range) columns. Introduce a shared `_sos_set_positions(labels, weights, label_to_pos)` helper that drops masked entries (label = -1) along with their weights and maps the survivors to active-variable positions via `model.variables.label_index.label_to_pos`. Both `Gurobi._build_solver_model` and `Xpress._build_solver_model` route through it. Empty SOS sets (all members masked, e.g. when a non-SOS dim mask removes an entire set) are now skipped instead of producing an empty vendor call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LP writer was emitting variable names from the raw label array. For a masked SOS member (label = -1) this produced `x-1`, which LP parsers either reject outright (cplex, xpress, mosek) or silently corrupt into a wrong SOS set (gurobi LP reader). The set identifier was computed as `labels.isel(sos_dim=0)`, which is -1 whenever the first slot of the set is masked — both an invalid LP identifier and one that collides across distinct sets that all happen to have a masked first slot. Both are fixed in `sos_to_file`: - Per-set id is now the `max` of non-masked labels along the SOS dim, which is always a valid label (any of the set's surviving members would do; max is the simplest stable choice). Fully-masked sets get id -1 and are filtered out before emission. - Masked member rows are dropped from the polars dataframe before the `group_by` aggregation, so masked members never reach the file. The fix preserves the LP path's polars vectorization (no switch to Python-level iteration) and stays separate from the direct-API helper — the two paths have identical-shape fixes implemented in each path's natural idiom. Also drop the now-obsolete `test_direct_api_raises_on_masked_sos` / `test_lp_writer_raises_on_masked_sos` tests in `test_sos_constraints`: they were guarding the defensive `_check_sos_unmasked` workaround that this PR replaces with the real fix. The same coverage (and much more) is now in `test_sos_masked.py`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The companion sos2 test was guarding the defensive _check_sos_unmasked workaround — asserting that piecewise-with-sos2 NaN-padded breakpoints raised NotImplementedError. With the real #688 fix landed, masked SOS lambdas now flow through both direct-API and LP-writer paths correctly, so the sos2 piecewise solve produces the same answer as the lp variant (y_b = 12.5 on the chord (5,10)→(15,15)). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "reformulation + masked SOS" integration test lived in test_sos_constraints alongside SOS-validation tests but its concern is the reformulation pipeline (the apply/undo path on a masked input), not the SOS constraint itself. Move it to TestSolveWithReformulation in test_sos_reformulation.py where the other reformulation tests live. Also drop the "documented workaround" framing in the docstring — with #688 fixed, reformulation isn't a workaround for masked SOS anymore, just one of two ways to solve such models. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
03d4d85 to
70c51ef
Compare
The SOS handling in Gurobi._build_solver_model and Xpress._build_solver_model was ~22 lines each, ~95% identical: same iteration over model.variables.sos, same 1D vs multi-dim stack/groupby branching, same masked-entry filter, same label→position resolution. Only the vendor addSOS call differed. Extract _iter_sos_sets(model) yielding (sos_type, positions, weights) per active SOS set. Each subclass collapses to a 2-line loop with its vendor call. The shared bug-prone parts (multi-dim groupby, mask filtering, label resolution) live in one place; a future SOS-capable direct-API solver (#683, OETC) plugs in with one addSOS line. Also replace `# type: ignore[assignment]` on sos_type/sos_dim with runtime int()/str() casts. The casts both document the actual runtime types and guard against future changes to what gets stored in var.attrs, with no measurable cost. The remaining type: ignore is on the int() call itself (mypy is overly conservative about xarray's Hashable-typed attrs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #690. Fixes #688.
Closes the SOS-on-masked-variables bug at its root rather than guarding against it. The defensive
_check_sos_unmaskedworkaround introduced by #689 is removed; the real fix covers both the direct-API path (Gurobi, Xpress) and the LP file writer.Root cause
Two related failure modes, both rooted in conflating linopy labels with solver-side positions / names:
gm.addSOS(...)/problem.addSOS(...)were given linopy labels as if they were 0-based column positions in the active-variable array. Labels equal positions only when no variable in the model is masked anywhere.sos_to_fileemitted variable names directly from the raw label array. A masked entry (label = -1) becamex-1, which LP parsers reject or (gurobi LP reader) silently corrupt. The set identifierlabels.isel(sos_dim=0)was -1 whenever the first slot was masked.Fix
_sos_set_positions(labels, weights, label_to_pos)helper drops masked entries and resolves linopy labels to active-variable positions viamodel.variables.label_index.label_to_pos. Both direct builders route through it; empty sets are skipped.sos_to_file): polars pipeline gets alabels != -1filter beforegroup_by; per-set id is themaxof non-masked labels (always a valid label; fully-masked sets get id -1 and are filtered out). Keeps polars vectorization.The two paths stay separate by design — LP polars vs direct numpy is the right idiom split for each path's throughput needs; unification would force LP to Python-level iteration with no maintenance win for a ~10-line shared concept.
Coverage
test/test_sos_masked.py(new) — 11 fixture configs (1D/2D × all mask placements × with/without unrelated-var mask) × 2 SOS types × every SOS-capable solver × {direct, lp} io_apis ≈ 110 cases. Three-layer oracle per test:status == "ok"+objective == expected+ element-wisesolution == expected. Asymmetric objective coefficients break permutation symmetry; analytical optimizer in the fixture provides exact oracles using list-position adjacency for SOS2.y_b = 12.5oracle as the lp variant).test_sos_reformulation.py, reworded — reformulation isn't a "workaround" anymore.TDD discipline
5 commits, each fix preceded by its failing test:
test(sos): regression matrix for masked SOS variables— drops defensive guard so tests surface the real bug. Without fix: 28 LP-path masked cases fail withName 'x-1' does not exist.fix(sos): resolve linopy labels to solver column positions in direct builds— direct-path passes.fix(sos): filter masked SOS members + robust set id in LP writer— LP-path passes.test(pwl): convert masked-sos2 NaN-padding test to positive assertion.test(sos): move masked-reformulation test to test_sos_reformulation.Test plan
pytest test/test_sos_masked.py— 66 active cases pass, 44 skipped.ruff check,ruff format,mypy linopy/{model,solvers,io}.pyclean.🤖 Generated with Claude Code