Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
392 changes: 0 additions & 392 deletions GRPC_INTERFACE.md

This file was deleted.

12 changes: 6 additions & 6 deletions GRPC_QUICK_START.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ same as a local solve — no API changes required.
### Basic (no TLS)

```bash
cuopt_grpc_server --port 8765 --workers 1
cuopt_grpc_server --port 5001 --workers 1
```

### TLS (server authentication)

```bash
cuopt_grpc_server --port 8765 \
cuopt_grpc_server --port 5001 \
--tls \
--tls-cert server.crt \
--tls-key server.key
Expand All @@ -33,7 +33,7 @@ cuopt_grpc_server --port 8765 \
### mTLS (mutual authentication)

```bash
cuopt_grpc_server --port 8765 \
cuopt_grpc_server --port 5001 \
--tls \
--tls-cert server.crt \
--tls-key server.key \
Expand Down Expand Up @@ -123,7 +123,7 @@ openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
**4. Start the server with your CA:**

```bash
cuopt_grpc_server --port 8765 \
cuopt_grpc_server --port 5001 \
--tls \
--tls-cert server.crt \
--tls-key server.key \
Expand All @@ -135,7 +135,7 @@ cuopt_grpc_server --port 8765 \

```bash
export CUOPT_REMOTE_HOST=server.example.com
export CUOPT_REMOTE_PORT=8765
export CUOPT_REMOTE_PORT=5001
export CUOPT_TLS_ENABLED=1
export CUOPT_TLS_ROOT_CERT=ca.crt # verifies the server
export CUOPT_TLS_CLIENT_CERT=client.crt # proves client identity
Expand All @@ -157,7 +157,7 @@ They apply identically to the Python API, `cuopt_cli`, and the C API.

```bash
export CUOPT_REMOTE_HOST=<server-hostname>
export CUOPT_REMOTE_PORT=8765
export CUOPT_REMOTE_PORT=5001
```

When both `CUOPT_REMOTE_HOST` and `CUOPT_REMOTE_PORT` are set, every
Expand Down
62 changes: 32 additions & 30 deletions GRPC_SERVER_ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,29 @@ For gRPC protocol and client API, see `GRPC_INTERFACE.md`. Server source files l
## Process Model

```text
┌────────────────────────────────────────────────────────────────────┐
│ Main Server Process
┌────────────────────────────────────────────────────────────────────
Main Server Process │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────────┐ │
│ │ gRPC │ │ Job │ │ Background Threads │ │
│ │ Service │ │ Tracker │ │ - Result retrieval │ │
│ │ Handler │ │ (job status,│ │ - Incumbent retrieval │ │
│ │ │ │ results) │ │ - Worker monitor │ │
│ └─────────────┘ └──────────────┘ └─────────────────────────────┘ │
│ │
│ │ shared memory │ pipes
│ ▼
│ ┌─────────────────────────────────────────────────────────────────┐
│ │ Shared Memory Queues
│ │ ┌─────────────────┐ ┌─────────────────────┐
│ │ │ Job Queue │ │ Result Queue │
│ │ │ (MAX_JOBS=100) │ │ (MAX_RESULTS=100) │
│ │ └─────────────────┘ └─────────────────────┘
│ └─────────────────────────────────────────────────────────────────┘
└────────────────────────────────────────────────────────────────────┘
│ fork() │
│ │
│ │ shared memory │ pipes
│ ▼
│ ┌────────────────────────────────────────────────────────────────
│ │ Shared Memory Queues
│ │ ┌─────────────────┐ ┌─────────────────────┐
│ │ │ Job Queue │ │ Result Queue │
│ │ │ (MAX_JOBS=100) │ │ (MAX_RESULTS=100) │
│ │ └─────────────────┘ └─────────────────────┘
│ └────────────────────────────────────────────────────────────────
└────────────────────────────────────────────────────────────────────
│ fork()
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Worker 0 │ │ Worker 1 │ │ Worker N │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
Expand Down Expand Up @@ -86,7 +86,9 @@ All paths below are under `cpp/src/grpc/server/`.
| `grpc_server_types.hpp` | Shared structs (e.g. `JobQueueEntry`, `ResultQueueEntry`, `ServerConfig`, `JobInfo`), enums, globals (atomics, mutexes, condition variables), and forward declarations used across server .cpp files. |
| `grpc_field_element_size.hpp` | Maps `cuopt::remote::ArrayFieldId` to element byte size; used by pipe deserialization and chunked logic. |
| `grpc_pipe_serialization.hpp` | Streaming pipe I/O: write/read individual length-prefixed protobuf messages (ChunkedProblemHeader, ChunkedResultHeader, ArrayChunk) directly to/from pipe fds. Avoids large intermediate buffers. Also serializes SubmitJobRequest for unary pipe transfer. |
| `grpc_pipe_io.cpp` | Low-level pipe read/write helpers: length-prefixed protobuf serialization, raw byte transfer with retry-on-EINTR, and pipe buffer sizing. |
| `grpc_incumbent_proto.hpp` | Build `Incumbent` proto from (job_id, objective, assignment) and parse it back; used by worker when pushing incumbents and by main when reading from the incumbent pipe. |
| `grpc_server_logger.{hpp,cpp}` | Server-side logging utilities: log file management, console echo, and log message formatting for worker processes. |
| `grpc_worker.cpp` | `worker_process(worker_index)`: loop over job queue, receive job data via pipe (unary or chunked), call solver, send result (and optionally incumbents) back. Contains `IncumbentPipeCallback` and `store_simple_result`. |
| `grpc_worker_infra.cpp` | Pipe creation/teardown, `spawn_worker` / `spawn_workers`, `wait_for_workers`, `mark_worker_jobs_failed`, `cleanup_shared_memory`. |
| `grpc_server_threads.cpp` | `worker_monitor_thread`, `result_retrieval_thread`, `incumbent_retrieval_thread`, `session_reaper_thread`. |
Expand Down Expand Up @@ -115,7 +117,7 @@ Client Server Worker
│─── SubmitJob ──────────►│ │
│ │ Create job entry │
│ │ Store problem data │
│ │ job_queue[slot].ready=true│
│ │ job_queue[slot].ready=true
│◄── job_id ──────────────│ │
```

Expand All @@ -132,8 +134,8 @@ Client Server Worker
│ │ │ solve_lp/solve_mip
│ │ │ Convert GPU→CPU
│ │ │
│ │ result_queue[slot].ready │◄──────────────────
│ │◄── result data via pipe ─│
│ │ result_queue[slot].ready
│ │◄── result data via pipe ─
```

### 3. Result Retrieval
Expand Down Expand Up @@ -213,23 +215,23 @@ The `StreamLogs` RPC:
## Job States

```text
┌─────────┐ submit ┌───────────┐ claim ┌────────────┐
│ QUEUED │──────────►│ PROCESSING│─────────►│ COMPLETED │
└─────────┘ └───────────┘ └────────────┘
│ │
│ cancel │ error
▼ ▼
┌───────────┐ ┌─────────┐
│ CANCELLED │ │ FAILED │
└───────────┘ └─────────┘
┌─────────┐ submit ┌───────────┐ claim ┌────────────┐
│ QUEUED │──────────►│ PROCESSING │─────────►│ COMPLETED │
└─────────┘ └───────────┘ └────────────┘
│ cancel │ error
┌───────────┐ ┌─────────┐
│ CANCELLED │ │ FAILED │
└───────────┘ └─────────┘
```

## Configuration Options

```bash
cuopt_grpc_server [options]

-p, --port PORT gRPC listen port (default: 8765)
-p, --port PORT gRPC listen port (default: 5001)
-w, --workers NUM Number of worker processes (default: 1)
--max-message-mb N Max gRPC message size in MiB (default: 256; clamped to [4 KiB, ~2 GiB])
--max-message-bytes N Max gRPC message size in bytes (exact; min 4096)
Expand Down
15 changes: 14 additions & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@ REPODIR=$(cd "$(dirname "$0")"; pwd)
LIBCUOPT_BUILD_DIR=${LIBCUOPT_BUILD_DIR:=${REPODIR}/cpp/build}
LIBMPS_PARSER_BUILD_DIR=${LIBMPS_PARSER_BUILD_DIR:=${REPODIR}/cpp/libmps_parser/build}

VALIDARGS="clean libcuopt cuopt_grpc_server libmps_parser cuopt_mps_parser cuopt cuopt_server cuopt_sh_client docs deb -a -b -g -fsanitize -tsan -msan -v -l= --verbose-pdlp --build-lp-only --no-fetch-rapids --skip-c-python-adapters --skip-tests-build --skip-routing-build --skip-fatbin-write --host-lineinfo [--cmake-args=\\\"<args>\\\"] [--cache-tool=<tool>] -n --allgpuarch --ci-only-arch --show_depr_warn -h --help"
VALIDARGS="clean codegen libcuopt cuopt_grpc_server libmps_parser cuopt_mps_parser cuopt cuopt_server cuopt_sh_client docs deb -a -b -g -fsanitize -tsan -msan -v -l= --verbose-pdlp --build-lp-only --no-fetch-rapids --skip-c-python-adapters --skip-tests-build --skip-routing-build --skip-fatbin-write --host-lineinfo [--cmake-args=\\\"<args>\\\"] [--cache-tool=<tool>] -n --allgpuarch --ci-only-arch --show_depr_warn -h --help"
HELP="$0 [<target> ...] [<flag> ...]
where <target> is:
clean - remove all existing build artifacts and configuration (start over)
codegen - regenerate gRPC .inc files and proto from field_registry.yaml (requires pyyaml)
libcuopt - build the cuopt C++ code
cuopt_grpc_server - build only the gRPC server binary (configures + builds libcuopt as needed)
libmps_parser - build the libmps_parser C++ code
Expand Down Expand Up @@ -358,6 +359,18 @@ if buildAll || hasArg libmps_parser; then
fi
fi

################################################################################
# Regenerate gRPC codegen .inc files from the field registry (explicit target only)
if hasArg codegen; then
echo "Regenerating codegen .inc files from field_registry.yaml..."
python "${REPODIR}"/cpp/codegen/generate_conversions.py \
--registry "${REPODIR}"/cpp/codegen/field_registry.yaml \
--output-dir "${REPODIR}"/cpp/codegen/generated
cp "${REPODIR}"/cpp/codegen/generated/cuopt_remote_data.proto \
"${REPODIR}"/cpp/src/grpc/cuopt_remote_data.proto
echo "Done. Remember to commit the generated files."
fi

################################################################################
# Configure and build libcuopt (and optionally just the gRPC server)
if buildAll || hasArg libcuopt || hasArg cuopt_grpc_server; then
Expand Down
5 changes: 4 additions & 1 deletion ci/test_cpp.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

set -euo pipefail
Expand Down Expand Up @@ -31,6 +31,9 @@ mkdir -p "${RAPIDS_TESTS_DIR}"

rapids-print-env

rapids-logger "Verify codegen output matches committed files"
./ci/verify_codegen.sh

rapids-logger "Check GPU usage"
nvidia-smi

Expand Down
57 changes: 57 additions & 0 deletions ci/verify_codegen.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Verify that committed codegen output matches what generate_conversions.py produces.
# Fails if a developer edited field_registry.yaml without re-running ./build.sh codegen.

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
REPO_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
CODEGEN_DIR="${REPO_DIR}/cpp/codegen"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codegen is a generic name, you can call it grpc_code_gen or grpc_gen

GENERATED_DIR="${CODEGEN_DIR}/generated"
PROTO_DEST="${REPO_DIR}/cpp/src/grpc/cuopt_remote_data.proto"

TMPDIR=$(mktemp -d)
trap 'rm -rf ${TMPDIR}' EXIT

echo "Running code generator into temp directory..."
python "${CODEGEN_DIR}/generate_conversions.py" \
--registry "${CODEGEN_DIR}/field_registry.yaml" \
--output-dir "${TMPDIR}"

echo "Comparing generated output with committed files..."

FAILED=0

for f in "${TMPDIR}"/*; do
fname=$(basename "$f")
committed="${GENERATED_DIR}/${fname}"
if [ ! -f "${committed}" ]; then
echo "MISSING: ${committed} (new generated file not committed)"
FAILED=1
continue
fi
if ! diff -q "$f" "${committed}" > /dev/null 2>&1; then
echo "MISMATCH: cpp/codegen/generated/${fname}"
diff -u "${committed}" "$f" | head -30
FAILED=1
fi
done
Comment on lines +28 to +41
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the directory comparison exhaustive under pipefail.

diff -u "${committed}" "$f" | head -30 returns non-zero on a real diff when set -o pipefail is enabled, so this block exits on the first mismatch before FAILED=1 is recorded or the rest of the generated files are checked. The loop also only walks ${TMPDIR}, so stale files that still exist in cpp/codegen/generated/ are never reported.

Suggested fix
 for f in "${TMPDIR}"/*; do
     fname=$(basename "$f")
     committed="${GENERATED_DIR}/${fname}"
     if [ ! -f "${committed}" ]; then
         echo "MISSING: ${committed} (new generated file not committed)"
         FAILED=1
         continue
     fi
     if ! diff -q "$f" "${committed}" > /dev/null 2>&1; then
         echo "MISMATCH: cpp/codegen/generated/${fname}"
-        diff -u "${committed}" "$f" | head -30
+        diff -u "${committed}" "$f" | head -30 || true
         FAILED=1
     fi
 done
+
+for committed in "${GENERATED_DIR}"/*; do
+    fname=$(basename "${committed}")
+    if [ ! -f "${TMPDIR}/${fname}" ]; then
+        echo "STALE: ${committed} (committed file is no longer generated)"
+        FAILED=1
+    fi
+done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for f in "${TMPDIR}"/*; do
fname=$(basename "$f")
committed="${GENERATED_DIR}/${fname}"
if [ ! -f "${committed}" ]; then
echo "MISSING: ${committed} (new generated file not committed)"
FAILED=1
continue
fi
if ! diff -q "$f" "${committed}" > /dev/null 2>&1; then
echo "MISMATCH: cpp/codegen/generated/${fname}"
diff -u "${committed}" "$f" | head -30
FAILED=1
fi
done
for f in "${TMPDIR}"/*; do
fname=$(basename "$f")
committed="${GENERATED_DIR}/${fname}"
if [ ! -f "${committed}" ]; then
echo "MISSING: ${committed} (new generated file not committed)"
FAILED=1
continue
fi
if ! diff -q "$f" "${committed}" > /dev/null 2>&1; then
echo "MISMATCH: cpp/codegen/generated/${fname}"
diff -u "${committed}" "$f" | head -30 || true
FAILED=1
fi
done
for committed in "${GENERATED_DIR}"/*; do
fname=$(basename "${committed}")
if [ ! -f "${TMPDIR}/${fname}" ]; then
echo "STALE: ${committed} (committed file is no longer generated)"
FAILED=1
fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/verify_codegen.sh` around lines 28 - 41, The current loop over TMPDIR uses
diff piped to head which fails under set -o pipefail and aborts the script on
first mismatch and it also never detects stale files in GENERATED_DIR; modify
the comparison so the diff output is captured without letting the pipe exit
non-zero terminate the loop (e.g., run diff -u "${committed}" "$f" > some buffer
or use diff ... | head -30 || true) and explicitly set FAILED=1 when a mismatch
is detected, and add a second pass that iterates over files in
"${GENERATED_DIR}" to flag any files that do not have a counterpart in
"${TMPDIR}" (use the same basename comparison used for fname) so stale generated
files are reported; update references to TMPDIR, GENERATED_DIR, FAILED, and the
diff invocation accordingly.


if [ -f "${TMPDIR}/cuopt_remote_data.proto" ] && [ -f "${PROTO_DEST}" ]; then
if ! diff -q "${TMPDIR}/cuopt_remote_data.proto" "${PROTO_DEST}" > /dev/null 2>&1; then
echo "MISMATCH: cpp/src/grpc/cuopt_remote_data.proto (not copied from codegen)"
FAILED=1
fi
fi
Comment on lines +43 to +48
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail when the copied proto is missing.

generate_conversions.py always writes cuopt_remote_data.proto into ${TMPDIR}, so the only time this block skips is when cpp/src/grpc/cuopt_remote_data.proto is missing. That should fail CI like any other missing generated artifact; otherwise deleting the copied proto passes verification silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/verify_codegen.sh` around lines 43 - 48, The verification currently skips
when the generated tmp proto exists but PROTO_DEST
(cpp/src/grpc/cuopt_remote_data.proto) is missing; change verify_codegen.sh so
that if "${TMPDIR}/cuopt_remote_data.proto" exists but "${PROTO_DEST}" does not,
the script prints a missing-file error and sets FAILED=1. Specifically, in the
block referencing "${TMPDIR}/cuopt_remote_data.proto" and "${PROTO_DEST}", add
an explicit check for the absence of PROTO_DEST (or invert the logic) to emit
"MISMATCH: cpp/src/grpc/cuopt_remote_data.proto (not copied from codegen)" and
set FAILED=1 instead of silently skipping.


if [ ${FAILED} -ne 0 ]; then
echo ""
echo "ERROR: Committed generated files are out of sync with field_registry.yaml."
echo "Run './build.sh codegen' and commit the results."
exit 1
fi

echo "OK: All generated files match field_registry.yaml."
2 changes: 2 additions & 0 deletions conda/environments/all_cuda-129_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,14 @@ dependencies:
- pytest-cov
- pytest<9.0
- python>=3.11,<3.15
- pyyaml
- pyyaml>=6.0.0
- rapids-build-backend>=0.4.0,<0.5.0
- rapids-logger==0.2.*,>=0.0.0a0
- re2
- requests
- rmm==26.4.*,>=0.0.0a0
- ruamel.yaml>=0.18
- scikit-build-core>=0.11.0
- scipy>=1.14.1
- sphinx
Expand Down
2 changes: 2 additions & 0 deletions conda/environments/all_cuda-129_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,14 @@ dependencies:
- pytest-cov
- pytest<9.0
- python>=3.11,<3.15
- pyyaml
- pyyaml>=6.0.0
- rapids-build-backend>=0.4.0,<0.5.0
- rapids-logger==0.2.*,>=0.0.0a0
- re2
- requests
- rmm==26.4.*,>=0.0.0a0
- ruamel.yaml>=0.18
- scikit-build-core>=0.11.0
- scipy>=1.14.1
- sphinx
Expand Down
2 changes: 2 additions & 0 deletions conda/environments/all_cuda-131_arch-aarch64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,14 @@ dependencies:
- pytest-cov
- pytest<9.0
- python>=3.11,<3.15
- pyyaml
- pyyaml>=6.0.0
- rapids-build-backend>=0.4.0,<0.5.0
- rapids-logger==0.2.*,>=0.0.0a0
- re2
- requests
- rmm==26.4.*,>=0.0.0a0
- ruamel.yaml>=0.18
- scikit-build-core>=0.11.0
- scipy>=1.14.1
- sphinx
Expand Down
2 changes: 2 additions & 0 deletions conda/environments/all_cuda-131_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,14 @@ dependencies:
- pytest-cov
- pytest<9.0
- python>=3.11,<3.15
- pyyaml
- pyyaml>=6.0.0
- rapids-build-backend>=0.4.0,<0.5.0
- rapids-logger==0.2.*,>=0.0.0a0
- re2
- requests
- rmm==26.4.*,>=0.0.0a0
- ruamel.yaml>=0.18
- scikit-build-core>=0.11.0
- scipy>=1.14.1
- sphinx
Expand Down
2 changes: 2 additions & 0 deletions conda/recipes/libcuopt/recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ cache:
- make
- ninja
- git
- python
- pyyaml
- tbb-devel
- zlib
- bzip2
Expand Down
Loading
Loading