diff --git a/AGENTS.md b/AGENTS.md index 370f8a15c..49fcc43eb 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,7 +8,7 @@ AI agent skills for NVIDIA cuOpt optimization engine. Skills live in **`skills/` ### Rules - `skills/cuopt-user-rules/` — Base rules for end users calling cuOpt (routing, LP, MILP, QP, install, server). Not for cuOpt internals — see `skills/cuopt-developer/`. Read first for user-facing tasks; choose skills from the index below by task and interface. -- `skills/cuopt-developer/` — Modify, build, test, debug, and contribute to cuOpt internals (C++/CUDA, Python, server, CI). Use for solver internals, PRs, DCO, and code conventions. +- `skills/cuopt-developer/` — Modify, build, test, debug, and contribute to cuOpt internals (C++/CUDA, Python, server, CI). Use for solver internals, PRs, DCO, and code conventions. For **VRP dimension** work (combine invariants, fwd/bwd propagation, new constraints/objectives in the routing engine), read **`skills/cuopt-developer/vrp_skills.md`** in addition to `SKILL.md`. - `skills/skill-evolution/` — Skill evolution: after solving a non-trivial problem, propose skill updates to capture generalizable learnings. ### Common (concepts only; no API code) diff --git a/skills/cuopt-developer/SKILL.md b/skills/cuopt-developer/SKILL.md index 34a2cf050..dc4707018 100644 --- a/skills/cuopt-developer/SKILL.md +++ b/skills/cuopt-developer/SKILL.md @@ -232,3 +232,11 @@ For build/test pitfalls (Cython rebuild, OOM, CUDA driver mismatch, missing `nvc - **Python binding architecture**: [resources/python_bindings.md](resources/python_bindings.md) _Shell-execution, install, sudo, and outside-workspace policies are covered by [Refusal Rules — Read First](#refusal-rules--read-first) at the top of this skill._ + +## VRP dimension internals (routing engine) + +When implementing or debugging **VRP dimensions** (constraints, objectives, forward/backward propagation, `combine`, local-search deltas), read: + +- **`skills/cuopt-developer/vrp_skills.md`** — architecture contracts, required interfaces, and implementation checklist. + +Read it **before** adding a new dimension or changing combine semantics. diff --git a/skills/cuopt-developer/vrp_skills.md b/skills/cuopt-developer/vrp_skills.md new file mode 100644 index 000000000..a3468de05 --- /dev/null +++ b/skills/cuopt-developer/vrp_skills.md @@ -0,0 +1,200 @@ +# cuOpt VRP Dimension Developer Skills + +--- + +## Skill 1: `cuopt-dimension-architecture` + +**When to use**: Before implementing any new constraint or objective in cuOpt. + +### The forward/backward propagation model +Each node stores accumulated state (`fwd_X`, `bwd_X`) so that combining any two adjacent fragments is O(1). This is the core design contract that makes cuOpt fast: +- `fwd_X[k]` = contribution of the prefix `[0..k]` +- `bwd_X[k]` = contribution of the suffix `[k..n]` +- No recomputation is needed when a move splits a route at any point + +### The combine invariant +`combine(node[k], node[k+1])` must return the **same value for every split point `k`** in a route (within floating-point tolerance — small differences from order of operations are acceptable; large gaps indicate a bug). This is the fundamental correctness contract. Violating it breaks local search delta evaluation (the solver computes `cost_after - cost_before` using combine; if combine is materially inconsistent, deltas are wrong). + +### Why boundaries double-count +`fwd_excess[k]` accumulates violations from `[0..k]`. `bwd_excess[k+1]` accumulates violations from `[k+1..n]`. At the join point `k → k+1`, both sides have already "seen" the in-transit state at that boundary — so their sum overcounts the boundary contribution once. The correction term `excess(fwd_state[k])` subtracts the double-counted boundary: +``` +combine(k, k+1) = fwd_excess[k] + bwd_excess[k+1] - excess(fwd_state[k]) +``` + +### Required interface for every dimension +| Method | Description | +|--------|-------------| +| `calculate_forward(next)` | Propagate fwd state from `this` to `next`; update `next.fwd_excess` | +| `calculate_backward(prev)` | Propagate bwd state from `this` to `prev`; update `prev.bwd_excess` | +| `combine(prev, next)` | O(1) total cost for joining two fragments; must satisfy the invariant | +| `get_cost(prev, this)` | Same formula as `combine`, called from `next`'s perspective | +| `compute_cost(n_nodes)` | Full-route cost; must equal `combine(last_node, return_depot)` | +| `forward_excess` | Returns `fwd_excess` as double | +| `backward_excess` | Returns `bwd_excess` as double | +| `forward_feasible` | True if `fwd_excess <= excess_limit` | +| `backward_feasible` | True if `bwd_excess <= excess_limit` | + +--- + +## Skill 2: `cuopt-implement-dimension` + +**When to use**: When given a constraint/objective description to implement as a new cuOpt dimension. + +### Step-by-step recipe + +**Step 1 — Define per-node state** +Identify the minimal set of scalars needed for O(1) propagation: +- What is "in transit" at each route position? (e.g. load, type counts, time) +- What accumulated violation measure can be updated incrementally? (e.g. excess load, incompatibility excess) +- Separate: *fixed data* (set once from problem input), *forward data*, *backward data* + +**Step 2 — Write `calculate_forward(next)`** +``` +propagate accumulated fwd_state from this → next +apply next node's demand to fwd_state +compute positional_excess = f(fwd_state_at_next) +next.fwd_excess = this.fwd_excess + positional_excess // depot nodes: no positional contribution +``` + +**Step 3 — Write `calculate_backward(prev)`** +Mirror of forward, applied in reverse direction. Backward demand direction is opposite to forward (e.g. a pickup that adds +1 forward subtracts -1 backward). + +**Step 4 — Derive `combine(prev, next)`** + +`combine` is the **core cost computation for every local search move**: operators evaluate candidate edits by differencing combined fragment costs (`cost_after - cost_before`). It is called extremely often, so **keep it as fast as possible**. + +- **Typical dimensions** (capacity, distance, simple time windows, etc.): `combine` is **O(1)** — only prefix/suffix scalars and a boundary correction. This is what all current VRP operators assume. +- **Richer dimensions** can be **much more expensive** — e.g. **O(log n)** in route size `n` when the join cost needs a non-trivial lookup (time-dependent travel times, multiple time windows, profile queries). Prefer precomputed tables or cached state so `combine` stays hot-path friendly; if it must be superlinear, document it and expect fewer applicable operators or higher move-evaluation cost. + +Write out the invariant formula and verify it equals the total route cost for a complete route: +``` +total = prev.fwd_excess + next.bwd_excess - boundary_correction(prev.fwd_state) +``` +where `boundary_correction` removes the double-counted overlap at the join point. + +**Step 5 — Derive `get_cost(prev, this)` from combine** + +`get_cost` is on the **same hot path as `combine`**: local search operators call it constantly when scoring edges and fragments. It must stay **as fast as `combine`** — same **O(1)** target for typical dimensions, same risk of **O(log n)** or worse for time-dependent travel, multiple time windows, etc. **Do not** put a separate heavy computation here. + +`get_cost` is called on the `next` node with `prev` passed in. It must be identical to `combine` — substitute `this` for `next`: +``` +get_cost(prev, this) == combine(prev, this) +``` +Implement by **delegating to `combine`** (or inlining the same formula). Do **not** derive an independent formula; any deviation breaks coherence assertions and can hide a slower code path. + +**Step 6 — Write `compute_cost(n_nodes)`** +Must equal `combine(last_service_node, fresh_return_depot)` within the same floating-point tolerance: +``` +compute_cost = fwd_excess[n_nodes] - boundary_correction(fwd_state[n_nodes]) +``` +(For a balanced route, `bwd_excess` at the return depot is 0 and `bwd_state` is 0, so the depot term drops out.) + +**Step 7 — Create the node class** +File: `cpp/src/routing/node/your_node.cuh` +- Fixed data fields (problem input) +- `fwd_state[]`, `fwd_excess`, `bwd_state[]`, `bwd_excess` +- All 9 interface methods listed in Skill 1 + +**Step 8 — Create the route class** +File: `cpp/src/routing/route/your_route.cuh` +- Host-side: `rmm::device_uvector` for each array (fixed, fwd, bwd) +- Device-side `view_t`: `raft::device_span` members, `get_node`, `set_node`, `set_forward_data`, `set_backward_data`, `copy_forward_data`, `copy_backward_data`, `copy_fixed_route_data`, `compute_cost`, `create_shared_route`, `get_shared_size` +- Stride layout: all arrays use `stride = n_nodes_route + 1`; multi-type arrays are row-major `[n_types * stride]` + +--- + +## Skill 3: `cuopt-dimension-wiring-checklist` + +**When to use**: After writing node/route logic, to ensure the dimension is fully integrated into the framework. + +### Files to create +- [ ] `cpp/src/routing/node/your_node.cuh` +- [ ] `cpp/src/routing/route/your_route.cuh` + +### Files to modify + +**`cpp/src/routing/routing_helpers.cuh`** (or `dimensions_info`) +- [ ] Add new `dim_t` enum value +- [ ] `enabled_dimensions_t::has_dimension` covers it +- [ ] `enabled_dimensions_t::get_dimension` covers it +- [ ] `loop_over_dimensions` range covers it (check `Start`/`End` bounds) + +**`cpp/src/routing/route/dimensions_route.cuh`** +- [ ] Add to `route_from_dim` type alias chain +- [ ] Add member `your_route_t your_dim` to `dimensions_route_t` +- [ ] Initialize in constructor: `your_dim(sol_handle_, dimensions_info_.get_dimension())` +- [ ] Copy constructor copies `your_dim` +- [ ] `view_t` has `typename your_route_t::view_t your_dim` member +- [ ] `view()` calls `get_dimension_of(v) = get_dimension_of(*this).view()` via loop — automatic if wired into enum + +**`cpp/src/routing/node/node.cuh`** +- [ ] `get_dimension()` returns `your_dim` member — add to the accessor chain + +**`cpp/src/routing/problem/problem.cuh`** +- [ ] Add storage for input data (e.g. `std::vector order_incompatible_types`) +- [ ] Add setter method + +**`cpp/src/routing/problem/problem.cu`** +- [ ] `populate_dimensions_info()`: enable dimension when input data is non-empty + +**`cpp/src/routing/util_kernels/set_nodes_data.cuh`** +- [ ] Depot boundary initialization in `set_route_data`: set `fwd_state[0] = 0`, `fwd_excess[0] = 0`, `bwd_state[n_nodes] = 0`, `bwd_excess[n_nodes] = 0` + +**`cpp/src/routing/fleet_info.hpp`** (if dimension has vehicle-level parameters) +- [ ] Add vehicle-level constraint data + +**Python/C API** +- [ ] Expose setter in C API header +- [ ] Python binding in the routing data class + +--- + +## Skill 4: `cuopt-dimension-testing` + +**When to use**: After implementing a new dimension, to write tests that validate correctness end-to-end. + +### C++ unit tests (`cpp/tests/routing/`) + +**Propagation correctness** +- Hand-craft a sequence of nodes with known demands +- Run `calculate_forward` step by step; assert `fwd_excess` at each position matches manually computed values +- Run `calculate_backward` step by step; assert `bwd_excess` matches + +**Combine invariant test** +- Build a full route (depot + service nodes + return depot) with known structure +- For every split point `k` from 0 to `n_nodes-1`, assert `combine(node[k], node[k+1])` returns the same value +- This is the single most important correctness test + +**`get_cost` / `combine` consistency** +- For several adjacent node pairs, assert `get_cost(prev=k, next=k+1) == combine(k, k+1)` + +**`compute_cost` / `combine` consistency** +- After running a full forward/backward pass, assert `compute_cost(route)` equals `combine(last_node, return_depot)` + +**Feasible and infeasible cases** +- Feasible: route with no violations → excess = 0 at all positions +- Infeasible: route with a known violation → excess > 0 at exactly the expected positions + +### Python integration tests (`python/cuopt/cuopt/tests/routing/`) + +**Solver finds feasible solution** +- Small problem where the dimension constraint is the only hard constraint +- Assert `is_feasible()` and infeasibility cost for the dimension is 0 + +**Solver restructures when unconstrained optimal is infeasible** +- Construct a problem where the greedy/shortest solution violates the dimension +- Assert the solver either finds a restructured feasible solution or reports infeasibility correctly + +**Mixed vehicles** +- Some vehicles subject to the constraint, some not (via `has_dimension` gating) +- Assert constrained vehicles comply; unconstrained vehicles are unaffected + +**No-regression test** +- Problem without setting the new dimension's input data +- Assert solver result matches pre-feature baseline (the dimension being absent should be invisible) + +### What every test should verify +- `is_feasible()` for the final solution when feasibility is expected +- Infeasibility cost for the new dimension is 0 in a feasible solution +- `check_device_host_coherence()` passes (GPU and host agree) +- Edge cases: empty route, single-node route, all nodes same type/value