diff --git a/.github/workflows/build_images.yaml b/.github/workflows/build_images.yaml
index aaf97fc88c..78a965efd0 100644
--- a/.github/workflows/build_images.yaml
+++ b/.github/workflows/build_images.yaml
@@ -55,6 +55,7 @@ jobs:
           cp ./LICENSE ./ci/docker/context/LICENSE
           cp ./VERSION ./ci/docker/context/VERSION
           cp ./thirdparty/THIRD_PARTY_LICENSES ./ci/docker/context/THIRD_PARTY_LICENSES
+          cp ./ci/docker/entrypoint.sh ./ci/docker/context/entrypoint.sh
       - name: Copy Commit SHA and commit time
         run: |
           git rev-parse HEAD > ./ci/docker/context/COMMIT_SHA
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 2835786ae4..8d03641fde 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -212,6 +212,17 @@ export RAPIDS_DATASET_ROOT_DIR=$CUOPT_HOME/datasets/
 cd $CUOPT_HOME/python
 pytest -v ${CUOPT_HOME}/python/cuopt/cuopt/tests
 ```
+## gRPC Remote Execution
+
+NVIDIA cuOpt includes a gRPC-based remote execution system for running solves on a
+GPU server from a program using the API locally. User documentation lives under `docs/cuopt/source/cuopt-grpc/` (Sphinx **gRPC remote execution** section):
+
+- `quick-start.rst` — Install/Docker/selector, how remote execution works, minimal LP and CLI examples (default C bundle).
+- `advanced.rst` — TLS, tuning, limitations, troubleshooting.
+- `examples.rst`, `api.rst` — Sample patterns and RPC overview.
+- `docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md` — Short **gRPC server behavior** page in user docs.
+- `cpp/docs/grpc-server-architecture.md` — Full contributor reference (IPC, C++ source map, streaming).
+
 ## Debugging cuOpt
 
 ### Building in debug mode from source
diff --git a/GRPC_INTERFACE.md b/GRPC_INTERFACE.md
deleted file mode 100644
index cdcffead97..0000000000
--- a/GRPC_INTERFACE.md
+++ /dev/null
@@ -1,392 +0,0 @@
-# gRPC Interface Architecture
-
-## Overview
-
-The cuOpt remote execution system uses gRPC for client-server communication. The interface
-supports arbitrarily large optimization problems (multi-GB) through a chunked array transfer
-protocol that uses only unary (request-response) RPCs — no bidirectional streaming.
-
-All client-server serialization uses protocol buffers generated by `protoc` and
-`grpc_cpp_plugin`. The internal server-to-worker pipe uses protobuf for metadata
-headers and raw byte transfer for bulk array data (see Security Notes).
-
-## Directory Layout
-
-All gRPC-related C++ source lives under a single tree:
-
-```
-cpp/src/grpc/
-├── cuopt_remote.proto              # Base protobuf messages (job status, settings, etc.)
-├── cuopt_remote_service.proto      # Service definition + messages (SubmitJob, ChunkedUpload, Incumbent, etc.)
-├── grpc_problem_mapper.{hpp,cpp}   # CPU problem ↔ proto (incl. chunked header)
-├── grpc_solution_mapper.{hpp,cpp}  # LP/MIP solution ↔ proto (unary + chunked)
-├── grpc_settings_mapper.{hpp,cpp}  # PDLP/MIP settings ↔ proto
-├── grpc_service_mapper.{hpp,cpp}   # Request/response builders (status, cancel, stream logs, etc.)
-├── client/
-│   ├── grpc_client.{hpp,cpp}       # High-level client: connect, submit, poll, get result
-│   └── solve_remote.cpp            # solve_lp_remote / solve_mip_remote (uses grpc_client)
-└── server/
-    ├── grpc_server_main.cpp        # main(), argument parsing, gRPC server setup
-    ├── grpc_service_impl.cpp       # CuOptRemoteServiceImpl — all RPC handlers
-    ├── grpc_server_types.hpp       # Shared types, globals, forward declarations
-    ├── grpc_field_element_size.hpp # ArrayFieldId → element byte size (codegen target)
-    ├── grpc_pipe_serialization.hpp # Pipe I/O: protobuf headers + raw byte arrays (request/result)
-    ├── grpc_incumbent_proto.hpp    # Incumbent proto build/parse (codegen target)
-    ├── grpc_worker.cpp             # worker_process(), incumbent callback, store_simple_result
-    ├── grpc_worker_infra.cpp       # Pipes, spawn, wait_for_workers, mark_worker_jobs_failed
-    ├── grpc_server_threads.cpp     # result_retrieval, incumbent_retrieval, session_reaper
-    └── grpc_job_management.cpp     # Pipe I/O, submit_job_async, check_status, cancel, etc.
-```
-
-- **Protos**: Live in `cpp/src/grpc/`. CMake generates C++ in the build dir (`cuopt_remote.pb.h`, `cuopt_remote_service.pb.h`, `cuopt_remote_service.grpc.pb.h`).
-- **Mappers**: Shared by client and server; convert between host C++ types and protobuf. Used for unary and chunked paths.
-- **Client**: Solver-level utility (not public API). Used by `solve_lp_remote`/`solve_mip_remote` and tests.
-- **Server**: Standalone executable `cuopt_grpc_server`. See `GRPC_SERVER_ARCHITECTURE.md` for process model and file roles.
-
-## Protocol Files
-
-| File | Purpose |
-|------|---------|
-| `cpp/src/grpc/cuopt_remote.proto` | Message definitions (problems, settings, solutions, field IDs) |
-| `cpp/src/grpc/cuopt_remote_service.proto` | gRPC service definition (RPCs) |
-
-Generated code is placed in the CMake build directory (not checked into source).
-
-## Service Interface
-
-```protobuf
-service CuOptRemoteService {
-  // Job submission (small problems, single message)
-  rpc SubmitJob(SubmitJobRequest) returns (SubmitJobResponse);
-
-  // Chunked upload (large problems, multiple unary RPCs)
-  rpc StartChunkedUpload(StartChunkedUploadRequest) returns (StartChunkedUploadResponse);
-  rpc SendArrayChunk(SendArrayChunkRequest) returns (SendArrayChunkResponse);
-  rpc FinishChunkedUpload(FinishChunkedUploadRequest) returns (SubmitJobResponse);
-
-  // Job management
-  rpc CheckStatus(StatusRequest) returns (StatusResponse);
-  rpc CancelJob(CancelRequest) returns (CancelResponse);
-  rpc DeleteResult(DeleteRequest) returns (DeleteResponse);
-
-  // Result retrieval (small results, single message)
-  rpc GetResult(GetResultRequest) returns (ResultResponse);
-
-  // Chunked download (large results, multiple unary RPCs)
-  rpc StartChunkedDownload(StartChunkedDownloadRequest) returns (StartChunkedDownloadResponse);
-  rpc GetResultChunk(GetResultChunkRequest) returns (GetResultChunkResponse);
-  rpc FinishChunkedDownload(FinishChunkedDownloadRequest) returns (FinishChunkedDownloadResponse);
-
-  // Blocking wait (returns status only, use GetResult afterward)
-  rpc WaitForCompletion(WaitRequest) returns (WaitResponse);
-
-  // Real-time streaming
-  rpc StreamLogs(StreamLogsRequest) returns (stream LogMessage);
-  rpc GetIncumbents(IncumbentRequest) returns (IncumbentResponse);
-}
-```
-
-## Chunked Array Transfer Protocol
-
-### Why Chunking?
-
-gRPC has per-message size limits (configurable, default set to 256 MiB in cuOpt), and
-protobuf has a hard 2 GB serialization limit. Optimization problems and their solutions
-can exceed several gigabytes, so a chunked transfer mechanism is needed.
-
-The protocol uses only **unary RPCs** (no bidirectional streaming), which simplifies
-error handling, load balancing, and proxy compatibility.
-
-### Upload Protocol (Large Problems)
-
-When the estimated serialized problem size exceeds 75% of `max_message_bytes`, the client
-splits large arrays into chunks and sends them via multiple unary RPCs:
-
-```
-Client                                          Server
-  |                                               |
-  |-- StartChunkedUpload(header, settings) -----> |
-  |<-- upload_id, max_message_bytes -------------- |
-  |                                               |
-  |-- SendArrayChunk(upload_id, field, data) ----> |
-  |<-- ok ---------------------------------------- |
-  |                                               |
-  |-- SendArrayChunk(upload_id, field, data) ----> |
-  |<-- ok ---------------------------------------- |
-  |           ...                                 |
-  |                                               |
-  |-- FinishChunkedUpload(upload_id) ------------> |
-  |<-- job_id ------------------------------------ |
-```
-
-**Key features:**
-- `StartChunkedUpload` sends a `ChunkedProblemHeader` with all scalar fields and
-  array metadata (`ArrayDescriptor` for each large array: field ID, total elements,
-  element size)
-- Each `SendArrayChunk` carries one chunk of one array, identified by `ArrayFieldId`
-  and `element_offset`
-- The server reports `max_message_bytes` so the client can adapt chunk sizing
-- `FinishChunkedUpload` triggers server-side reassembly and job submission
-
-### Download Protocol (Large Results)
-
-When the result exceeds the gRPC max message size, the client fetches it via
-chunked unary RPCs (mirrors the upload pattern):
-
-```
-Client                                           Server
-  |                                                |
-  |-- StartChunkedDownload(job_id) --------------> |
-  |<-- download_id, ChunkedResultHeader ---------- |
-  |                                                |
-  |-- GetResultChunk(download_id, field, off) ----> |
-  |<-- data bytes --------------------------------- |
-  |                                                |
-  |-- GetResultChunk(download_id, field, off) ----> |
-  |<-- data bytes --------------------------------- |
-  |           ...                                  |
-  |                                                |
-  |-- FinishChunkedDownload(download_id) ---------> |
-  |<-- ok ----------------------------------------- |
-```
-
-**Key features:**
-- `ChunkedResultHeader` carries all scalar fields (termination status, objectives,
-  residuals, solve time, warm start scalars) plus `ResultArrayDescriptor` entries
-  for each array (solution vectors, warm start arrays)
-- Each `GetResultChunk` fetches a slice of one array, identified by `ResultFieldId`
-  and `element_offset`
-- `FinishChunkedDownload` releases the server-side download session state
-- LP results include PDLP warm start data (9 arrays + 8 scalars) for subsequent
-  warm-started solves
-
-### Automatic Routing
-
-The client handles size-based routing transparently:
-
-1. **Upload**: Estimate serialized problem size
-   - Below 75% of `max_message_bytes` → unary `SubmitJob`
-   - Above threshold → `StartChunkedUpload` + `SendArrayChunk` + `FinishChunkedUpload`
-2. **Download**: Check `result_size_bytes` from `CheckStatus`
-   - Below `max_message_bytes` → unary `GetResult`
-   - Above limit (or `RESOURCE_EXHAUSTED`) → chunked download RPCs
-
-## Error Handling
-
-### gRPC Status Codes
-
-| Code | Meaning | Client Action |
-|------|---------|---------------|
-| `OK` | Success | Process result |
-| `NOT_FOUND` | Job ID not found | Check job ID |
-| `RESOURCE_EXHAUSTED` | Message too large | Use chunked transfer |
-| `CANCELLED` | Job was cancelled | Handle gracefully |
-| `DEADLINE_EXCEEDED` | Timeout | Retry or increase timeout |
-| `UNAVAILABLE` | Server not reachable | Retry with backoff |
-| `INTERNAL` | Server error | Report to user |
-| `INVALID_ARGUMENT` | Bad request | Fix request |
-
-### Connection Handling
-
-- Client detects `context->IsCancelled()` for graceful disconnect
-- Server cleans up job state on client disconnect during upload
-- Automatic reconnection is NOT built-in (caller should retry)
-
-## Completion Strategy
-
-The `solve_lp` and `solve_mip` methods poll `CheckStatus` every `poll_interval_ms`
-until the job reaches a terminal state (COMPLETED/FAILED/CANCELLED) or `timeout_seconds`
-is exceeded. During polling, MIP incumbent callbacks are invoked on the main thread.
-
-The `WaitForCompletion` RPC is available as a public async API primitive for callers
-managing jobs directly, but it is not used by the convenience `solve_*` methods because
-polling provides timeout protection and enables incumbent callbacks.
-
-## Client API (`grpc_client_t`)
-
-### Configuration
-
-```cpp
-struct grpc_client_config_t {
-  std::string server_address = "localhost:8765";
-  int poll_interval_ms       = 1000;
-  int timeout_seconds        = 3600;  // Max wait for job completion (1 hour)
-  bool stream_logs           = false; // Stream solver logs from server
-
-  // Callbacks
-  std::function<void(const std::string&)> log_callback;
-  std::function<void(const std::string&)> debug_log_callback;  // Internal client debug messages
-  std::function<bool(int64_t, double, const std::vector<double>&)> incumbent_callback;
-  int incumbent_poll_interval_ms = 1000;
-
-  // TLS configuration
-  bool enable_tls = false;
-  std::string tls_root_certs;   // CA certificate (PEM)
-  std::string tls_client_cert;  // Client certificate (mTLS)
-  std::string tls_client_key;   // Client private key (mTLS)
-
-  // Transfer configuration
-  int64_t max_message_bytes = 256 * 1024 * 1024;  // 256 MiB
-  int64_t chunk_size_bytes  = 16 * 1024 * 1024;   // 16 MiB per chunk
-  // Chunked upload threshold is computed as 75% of max_message_bytes.
-  bool enable_transfer_hash = false;               // FNV-1a hash logging
-};
-```
-
-### Synchronous Operations
-
-```cpp
-// Blocking solve — handles chunked transfer automatically
-auto result = client.solve_lp(problem, settings);
-auto result = client.solve_mip(problem, settings, enable_incumbents);
-```
-
-### Asynchronous Operations
-
-```cpp
-// Submit and get job ID
-auto submit = client.submit_lp(problem, settings);
-std::string job_id = submit.job_id;
-
-// Poll for status
-auto status = client.check_status(job_id);
-
-// Get result when ready
-auto result = client.get_lp_result<int, double>(job_id);
-
-// Cancel or delete
-client.cancel_job(job_id);
-client.delete_job(job_id);
-```
-
-### Real-Time Streaming
-
-```cpp
-// Log streaming (callback-based)
-client.stream_logs(job_id, 0, [](const std::string& line, bool done) {
-  std::cout << line;
-  return true;  // continue streaming
-});
-
-// Incumbent polling (during MIP solve)
-config.incumbent_callback = [](int64_t idx, double obj, const auto& sol) {
-  std::cout << "Incumbent " << idx << ": " << obj << "\n";
-  return true;  // return false to cancel solve
-};
-```
-
-## Environment Variables
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `CUOPT_REMOTE_HOST` | `localhost` | Server hostname for remote solves |
-| `CUOPT_REMOTE_PORT` | `8765` | Server port for remote solves |
-| `CUOPT_CHUNK_SIZE` | 16 MiB | Override `chunk_size_bytes` |
-| `CUOPT_MAX_MESSAGE_BYTES` | 256 MiB | Override `max_message_bytes` |
-| `CUOPT_GRPC_DEBUG` | `0` | Enable client debug/throughput logging (`0` or `1`) |
-| `CUOPT_TLS_ENABLED` | `0` | Enable TLS for client connections (`0` or `1`) |
-| `CUOPT_TLS_ROOT_CERT` | *(none)* | Path to PEM root CA file (server verification) |
-| `CUOPT_TLS_CLIENT_CERT` | *(none)* | Path to PEM client certificate file (for mTLS) |
-| `CUOPT_TLS_CLIENT_KEY` | *(none)* | Path to PEM client private key file (for mTLS) |
-
-## TLS Configuration
-
-### Server-Side TLS
-
-```bash
-./cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key
-```
-
-### Mutual TLS (mTLS)
-
-Server requires client certificate:
-
-```bash
-./cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key \
-  --tls-root ca.crt \
-  --require-client-cert
-```
-
-Client provides certificate via environment variables (applies to Python, `cuopt_cli`, and C API):
-
-```bash
-export CUOPT_TLS_ENABLED=1
-export CUOPT_TLS_ROOT_CERT=ca.crt
-export CUOPT_TLS_CLIENT_CERT=client.crt
-export CUOPT_TLS_CLIENT_KEY=client.key
-```
-
-Or programmatically via `grpc_client_config_t`:
-
-```cpp
-config.enable_tls = true;
-config.tls_root_certs = read_file("ca.crt");
-config.tls_client_cert = read_file("client.crt");
-config.tls_client_key = read_file("client.key");
-```
-
-## Message Size Limits
-
-| Configuration | Default | Notes |
-|---------------|---------|-------|
-| Server `--max-message-mb` | 256 MiB | Per-message limit (also `--max-message-bytes` for exact byte values) |
-| Server clamping | [4 KiB, ~2 GiB] | Enforced at startup to stay within protobuf's serialization limit |
-| Client `max_message_bytes` | 256 MiB | Clamped to [4 MiB, ~2 GiB] at construction |
-| Chunk size | 16 MiB | Payload per `SendArrayChunk`/`GetResultChunk` |
-| Chunked threshold | 75% of max_message_bytes | Problems above this use chunked upload (e.g. 192 MiB when max is 256 MiB) |
-
-Chunked transfer allows unlimited total payload size; only individual
-chunks must fit within the per-message limit. Neither client nor server
-allows "unlimited" message size — both clamp to the protobuf 2 GiB ceiling.
-
-## Security Notes
-
-1. **gRPC Layer**: All client-server message parsing uses protobuf-generated code
-2. **Internal Pipe**: The server-to-worker pipe uses protobuf for metadata headers
-   and length-prefixed raw `read()`/`write()` for bulk array data. This pipe is
-   internal to the server process (main → forked worker) and not exposed to clients.
-3. **Standard gRPC Security**: HTTP/2 framing, flow control, standard status codes
-4. **TLS Support**: Optional encryption with mutual authentication
-5. **Input Validation**: Server validates all incoming gRPC messages before processing
-
-## Data Flow Summary
-
-```
-┌─────────┐                                    ┌─────────────┐
-│ Client  │                                    │   Server    │
-│         │  SubmitJob (small)                 │             │
-│ problem ├───────────────────────────────────►│ deserialize │
-│         │  -or- Chunked Upload (large)       │      ↓      │
-│         │                                    │   worker    │
-│         │                                    │   process   │
-│         │  GetResult (small)                 │      ↓      │
-│ solution│◄───────────────────────────────────┤  serialize  │
-│         │  -or- Chunked Download (large)     │             │
-└─────────┘                                    └─────────────┘
-```
-
-See `GRPC_SERVER_ARCHITECTURE.md` for details on internal server architecture.
-
-## Code Generation
-
-The `cpp/codegen` directory (optional) generates conversion snippets from `field_registry.yaml`. Targets include:
-
-- **Settings**: PDLP/MIP settings ↔ proto (replacing hand-written blocks in the settings mapper).
-- **Result header/scalars/arrays**: ChunkedResultHeader and array field handling.
-- **Field element size**: `grpc_field_element_size.hpp` (ArrayFieldId → byte size).
-- **Incumbent**: `grpc_incumbent_proto.hpp` (build/parse `Incumbent` messages).
-
-Adding or changing a proto field can be done via YAML and regenerate instead of editing mapper code by hand.
-
-## Build
-
-- **libcuopt**: Includes the mapper `.cpp` files, `grpc_client.cpp`, and `solve_remote.cpp`. Requires `CUOPT_ENABLE_GRPC`, gRPC, and protobuf. Proto generation is done by CMake custom commands that depend on the `.proto` files in `cpp/src/grpc/`.
-- **cuopt_grpc_server**: Executable built from `cpp/src/grpc/server/*.cpp`; links libcuopt, gRPC, protobuf.
-
-Tests that use the client (e.g. `grpc_client_test.cpp`, `grpc_integration_test.cpp`) get `cpp/src/grpc` and `cpp/src/grpc/client` in their include path.
diff --git a/GRPC_QUICK_START.md b/GRPC_QUICK_START.md
deleted file mode 100644
index a3864c101e..0000000000
--- a/GRPC_QUICK_START.md
+++ /dev/null
@@ -1,248 +0,0 @@
-# cuOpt gRPC Remote Execution Quick Start
-
-This guide shows how to start the cuOpt gRPC server and solve
-optimization problems remotely from Python, `cuopt_cli`, or the C API.
-
-All three interfaces use the same environment variables for remote
-configuration. Once the env vars are set, your code works exactly the
-same as a local solve &mdash; no API changes required.
-
-## Prerequisites
-
-- A host with an NVIDIA GPU and cuOpt installed (server side).
-- cuOpt client libraries installed on the client host (can be CPU-only).
-- `cuopt_grpc_server` binary available (ships with the cuOpt package).
-
-## 1. Start the Server
-
-### Basic (no TLS)
-
-```bash
-cuopt_grpc_server --port 8765 --workers 1
-```
-
-### TLS (server authentication)
-
-```bash
-cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key
-```
-
-### mTLS (mutual authentication)
-
-```bash
-cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key \
-  --tls-root ca.crt \
-  --require-client-cert
-```
-
-See `GRPC_SERVER_ARCHITECTURE.md` for the full set of server flags.
-
-### How mTLS Works
-
-With mTLS the server verifies every client, and the client verifies the
-server. The trust model is based on Certificate Authorities (CAs), not
-individual certificates:
-
-- **`--tls-root ca.crt`** tells the server which CA to trust. Any client
-  presenting a certificate signed by this CA is accepted. The server
-  never sees or stores individual client certificates.
-- **`--require-client-cert`** makes client verification mandatory. Without
-  it the server requests a client cert but still allows unauthenticated
-  connections.
-- On the client side, `CUOPT_TLS_ROOT_CERT` is the CA that signed the
-  *server* certificate, so the client can verify the server's identity.
-
-### Restricting Access with a Custom CA
-
-To limit which clients can reach your server, create a private CA and
-only issue client certificates to authorized users. Anyone without a
-certificate signed by your CA is rejected at the TLS handshake before
-any solver traffic is exchanged.
-
-**1. Create a private CA (one-time setup):**
-
-```bash
-# Generate CA private key
-openssl genrsa -out ca.key 4096
-
-# Generate self-signed CA certificate (valid 10 years)
-openssl req -new -x509 -key ca.key -sha256 -days 3650 \
-  -subj "/CN=cuopt-internal-ca" -out ca.crt
-```
-
-**2. Issue a client certificate:**
-
-```bash
-# Generate client key
-openssl genrsa -out client.key 2048
-
-# Create a certificate signing request
-openssl req -new -key client.key \
-  -subj "/CN=team-member-alice" -out client.csr
-
-# Sign with your CA
-openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key \
-  -CAcreateserial -days 365 -sha256 -out client.crt
-```
-
-Repeat step 2 for each authorized client. Keep `ca.key` private;
-distribute only `ca.crt` (to the server) and the per-client
-`client.crt` + `client.key` pairs.
-
-**3. Issue a server certificate (signed by the same CA):**
-
-```bash
-# Generate server key
-openssl genrsa -out server.key 2048
-
-# Create CSR with subjectAltName matching the hostname clients will use
-openssl req -new -key server.key \
-  -subj "/CN=server.example.com" -out server.csr
-
-# Write a SAN extension file (DNS and/or IP must match client's target)
-cat > server.ext <<EOF
-subjectAltName=DNS:server.example.com,DNS:localhost,IP:127.0.0.1
-EOF
-
-# Sign with your CA
-openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
-  -CAcreateserial -days 365 -sha256 -extfile server.ext -out server.crt
-```
-
-> **Note:** `server.crt` must be signed by the same CA distributed to
-> clients, and its `subjectAltName` must match the hostname or IP that
-> clients connect to. gRPC (BoringSSL) requires SAN — `CN` alone is
-> not sufficient for hostname verification.
-
-**4. Start the server with your CA:**
-
-```bash
-cuopt_grpc_server --port 8765 \
-  --tls \
-  --tls-cert server.crt \
-  --tls-key server.key \
-  --tls-root ca.crt \
-  --require-client-cert
-```
-
-**5. Configure an authorized client:**
-
-```bash
-export CUOPT_REMOTE_HOST=server.example.com
-export CUOPT_REMOTE_PORT=8765
-export CUOPT_TLS_ENABLED=1
-export CUOPT_TLS_ROOT_CERT=ca.crt          # verifies the server
-export CUOPT_TLS_CLIENT_CERT=client.crt    # proves client identity
-export CUOPT_TLS_CLIENT_KEY=client.key
-```
-
-**Revoking access:** gRPC's built-in TLS does not support Certificate
-Revocation Lists (CRL) or OCSP. To revoke a client, either stop issuing
-new certs from the compromised CA and rotate to a new one, or deploy a
-reverse proxy (e.g., Envoy) in front of the server that supports CRL
-checking.
-
-## 2. Configure the Client (All Interfaces)
-
-Set these environment variables before running any cuOpt client.
-They apply identically to the Python API, `cuopt_cli`, and the C API.
-
-### Required
-
-```bash
-export CUOPT_REMOTE_HOST=<server-hostname>
-export CUOPT_REMOTE_PORT=8765
-```
-
-When both `CUOPT_REMOTE_HOST` and `CUOPT_REMOTE_PORT` are set, every
-call to `solve_lp` / `solve_mip` is transparently forwarded to the
-remote server. No code changes are needed.
-
-### TLS (optional)
-
-```bash
-export CUOPT_TLS_ENABLED=1
-export CUOPT_TLS_ROOT_CERT=ca.crt               # verify server certificate
-```
-
-For mTLS, also provide the client identity:
-
-```bash
-export CUOPT_TLS_CLIENT_CERT=client.crt
-export CUOPT_TLS_CLIENT_KEY=client.key
-```
-
-### Tuning (optional)
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `CUOPT_CHUNK_SIZE` | 16 MiB | Bytes per chunk for large problem transfer |
-| `CUOPT_MAX_MESSAGE_BYTES` | 256 MiB | Client-side gRPC max message size |
-| `CUOPT_GRPC_DEBUG` | `0` | Enable debug / throughput logging (`1` to enable) |
-
-## 3. Usage Examples
-
-Once the env vars are set, write your solver code exactly as you would
-for a local solve. The remote transport is handled automatically.
-
-### Python
-
-```python
-import cuopt_mps_parser
-from cuopt import linear_programming
-
-# Parse an MPS file
-dm = cuopt_mps_parser.ParseMps("model.mps")
-
-# Solve (routed to remote server via env vars)
-solution = linear_programming.Solve(dm, linear_programming.SolverSettings())
-
-print("Objective:", solution.get_primal_objective())
-print("Primal:   ", solution.get_primal_solution()[:5], "...")
-```
-
-### cuopt_cli
-
-```bash
-cuopt_cli model.mps
-```
-
-With solver options:
-
-```bash
-cuopt_cli model.mps --time-limit 30 --relaxation
-```
-
-### C++ API
-
-```cpp
-#include <cuopt/linear_programming/solve.hpp>
-#include <cuopt/linear_programming/cpu_optimization_problem.hpp>
-
-// Build problem using cpu_optimization_problem_t ...
-auto solution = cuopt::linear_programming::solve_lp(cpu_problem, settings);
-```
-
-The same `solve_lp` / `solve_mip` functions automatically detect the
-`CUOPT_REMOTE_HOST` / `CUOPT_REMOTE_PORT` env vars and forward to the
-gRPC server when they are set.
-
-## Troubleshooting
-
-| Symptom | Check |
-|---------|-------|
-| Connection refused | Verify the server is running and the host/port are correct. |
-| TLS handshake failure | Ensure `CUOPT_TLS_ENABLED=1` is set and certificate paths are correct. |
-| `Cannot open TLS file: ...` | The path in the TLS env var does not exist or is not readable. |
-| Timeout on large problems | Increase the solver `time_limit` or the client `timeout_seconds`. |
-
-## Further Reading
-
-- `GRPC_INTERFACE.md` &mdash; Protocol details, chunked transfer, client config, message sizes.
-- `GRPC_SERVER_ARCHITECTURE.md` &mdash; Server process model, IPC, threads, job lifecycle.
diff --git a/ci/docker/Dockerfile b/ci/docker/Dockerfile
index 1d49a4c04a..6df4159d81 100644
--- a/ci/docker/Dockerfile
+++ b/ci/docker/Dockerfile
@@ -45,6 +45,7 @@ RUN ln -sf /usr/bin/python${PYTHON_SHORT_VER} /usr/bin/python
 
 FROM python-env AS install-env
 
+ARG CUDA_VER
 ARG CUOPT_VER
 ARG PYTHON_SHORT_VER
 
@@ -68,36 +69,18 @@ FROM install-env AS cuopt-final
 
 ARG PYTHON_SHORT_VER
 
-# Consolidate all directory creation, permissions, and file operations into a single layer
+# Make cuopt_grpc_server, cuopt_cli, and shared libraries available to all processes
+# (profile.d scripts are only sourced by login shells; ENV works for all containers)
+ENV PATH="/usr/local/cuda/bin:/usr/bin:/usr/local/bin:/usr/local/nvidia/bin/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/bin:${PATH}"
+ENV LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/wsl/lib:/usr/lib/wsl/lib/libnvidia-container:/usr/lib/nvidia:/usr/lib/nvidia-current:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/lib/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/rapids_logger/lib64:${LD_LIBRARY_PATH}"
+
+# Directory creation, permissions
 RUN mkdir -p /opt/cuopt && \
     chmod 777 /opt/cuopt && \
-    # Create profile.d script for universal access
-    echo '#!/bin/bash' > /etc/profile.d/cuopt.sh && \
-    echo 'export PATH="/usr/local/cuda/bin:/usr/bin:/usr/local/bin:/usr/local/nvidia/bin/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/bin:$PATH"' >> /etc/profile.d/cuopt.sh && \
-    echo 'export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/wsl/lib:/usr/lib/wsl/lib/libnvidia-container:/usr/lib/nvidia:/usr/lib/nvidia-current:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/lib/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/rapids_logger/lib64:${LD_LIBRARY_PATH}"' >> /etc/profile.d/cuopt.sh && \
-    chmod +x /etc/profile.d/cuopt.sh && \
-    # Set in /etc/environment for system-wide access
-    echo 'PATH="/usr/local/cuda/bin:/usr/bin:/usr/local/bin:/usr/local/nvidia/bin/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/bin:$PATH"' >> /etc/environment && \
-    echo 'LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/lib/aarch64-linux-gnu:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/wsl/lib:/usr/lib/wsl/lib/libnvidia-container:/usr/lib/nvidia:/usr/lib/nvidia-current:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt/lib/:/usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/rapids_logger/lib64:${LD_LIBRARY_PATH}"' >> /etc/environment && \
-    # Set proper permissions for cuOpt installation
     chmod -R 755 /usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/cuopt* && \
     chmod -R 755 /usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/libcuopt* && \
     chmod -R 755 /usr/local/lib/python${PYTHON_SHORT_VER}/dist-packages/cuopt_* && \
-    chmod -R 755 /usr/local/bin/* && \
-    # Create entrypoint script in a single operation
-    echo '#!/bin/bash' > /opt/cuopt/entrypoint.sh && \
-    echo 'set -e' >> /opt/cuopt/entrypoint.sh && \
-    echo '' >> /opt/cuopt/entrypoint.sh && \
-    echo '# Get current user info from Docker environment variables' >> /opt/cuopt/entrypoint.sh && \
-    echo 'CURRENT_UID=${UID:-1000}' >> /opt/cuopt/entrypoint.sh && \
-    echo 'CURRENT_GID=${GID:-1000}' >> /opt/cuopt/entrypoint.sh && \
-    echo '' >> /opt/cuopt/entrypoint.sh && \
-    echo '# Set environment variables for the current user' >> /opt/cuopt/entrypoint.sh && \
-    echo 'export HOME="/opt/cuopt"' >> /opt/cuopt/entrypoint.sh && \
-    echo '' >> /opt/cuopt/entrypoint.sh && \
-    echo '# Execute the command' >> /opt/cuopt/entrypoint.sh && \
-    echo 'exec "$@"' >> /opt/cuopt/entrypoint.sh && \
-    chmod +x /opt/cuopt/entrypoint.sh
+    chmod -R 755 /usr/local/bin/*
 
 # Set the default working directory to the cuopt folder
 WORKDIR /opt/cuopt
@@ -112,6 +95,10 @@ COPY --from=cuda-libs /usr/local/cuda/lib64/libnvJitLink* /usr/local/cuda/lib64/
 # Copy CUDA headers needed for runtime compilation (e.g., CuPy NVRTC).
 COPY --from=cuda-headers /usr/local/cuda/include/ /usr/local/cuda/include/
 
-# Use the flexible entrypoint
+# Entrypoint supports server selection:
+#   Default:                  Python REST server
+#   CUOPT_SERVER_TYPE=grpc:   gRPC server (uses CUOPT_SERVER_PORT, CUOPT_GPU_COUNT)
+#   Explicit command:         docker run <image> cuopt_grpc_server [args...]
+COPY ./entrypoint.sh /opt/cuopt/entrypoint.sh
 ENTRYPOINT ["/opt/cuopt/entrypoint.sh"]
 CMD ["python", "-m", "cuopt_server.cuopt_service"]
diff --git a/ci/docker/entrypoint.sh b/ci/docker/entrypoint.sh
new file mode 100755
index 0000000000..3ee22dd086
--- /dev/null
+++ b/ci/docker/entrypoint.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Entrypoint for the cuOpt container image.
+#
+# Server selection (in order of precedence):
+#   1. Explicit command: docker run <image> cuopt_grpc_server [args...]
+#   2. Environment variable: CUOPT_SERVER_TYPE=grpc
+#   3. Default: Python REST server (cuopt_server.cuopt_service)
+#
+# When CUOPT_SERVER_TYPE=grpc, the following env vars configure the gRPC server:
+#   CUOPT_SERVER_PORT  — listen port       (default: 5001)
+#   CUOPT_GPU_COUNT    — worker processes  (default: 1)
+#   CUOPT_GRPC_ARGS    — additional CLI flags passed verbatim
+#                        (e.g. "--tls --tls-cert server.crt --log-to-console")
+#                        See docs/cuopt/source/cuopt-grpc/advanced.rst (flags/env);
+#                        cpp/docs/grpc-server-architecture.md for contributor IPC details.
+#                        for all available flags.
+
+set -e
+
+export HOME="/opt/cuopt"
+
+# If CUOPT_SERVER_TYPE=grpc, build a command line from env vars and launch.
+if [ "${CUOPT_SERVER_TYPE}" = "grpc" ]; then
+    GRPC_CMD=(cuopt_grpc_server)
+
+    GRPC_CMD+=(--port "${CUOPT_SERVER_PORT:-5001}")
+
+    if [ -n "${CUOPT_GPU_COUNT}" ]; then
+        GRPC_CMD+=(--workers "${CUOPT_GPU_COUNT}")
+    fi
+
+    # Allow arbitrary extra flags (e.g. --tls, --log-to-console)
+    if [ -n "${CUOPT_GRPC_ARGS}" ]; then
+        read -ra EXTRA <<< "${CUOPT_GRPC_ARGS}"
+        GRPC_CMD+=("${EXTRA[@]}")
+    fi
+
+    exec "${GRPC_CMD[@]}"
+fi
+
+exec "$@"
diff --git a/cpp/docs/DEVELOPER_GUIDE.md b/cpp/docs/DEVELOPER_GUIDE.md
index 716248b245..ba074b0e88 100644
--- a/cpp/docs/DEVELOPER_GUIDE.md
+++ b/cpp/docs/DEVELOPER_GUIDE.md
@@ -3,6 +3,7 @@
 This document serves as a guide for contributors to cuOpt C++ code. Developers should also refer
 to these additional files for further documentation of cuOpt best practices.
 
+* [gRPC server architecture](grpc-server-architecture.md) — full `cuopt_grpc_server` IPC, source file map, and streaming internals (end-user summary lives under `docs/cuopt/source/cuopt-grpc/`).
 * [Documentation Guide](TODO) for guidelines on documenting cuOpt code.
 * [Testing Guide](TODO) for guidelines on writing unit tests.
 * [Benchmarking Guide](TODO) for guidelines on writing unit benchmarks.
diff --git a/GRPC_SERVER_ARCHITECTURE.md b/cpp/docs/grpc-server-architecture.md
similarity index 89%
rename from GRPC_SERVER_ARCHITECTURE.md
rename to cpp/docs/grpc-server-architecture.md
index 2d6c2c324b..9d19a9b2ef 100644
--- a/GRPC_SERVER_ARCHITECTURE.md
+++ b/cpp/docs/grpc-server-architecture.md
@@ -1,14 +1,21 @@
-# Server Architecture
+# NVIDIA cuOpt gRPC server architecture
 
-## Overview
+<!--
+  SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+  SPDX-License-Identifier: Apache-2.0
+-->
 
-The cuOpt gRPC server (`cuopt_grpc_server`) is a multi-process architecture designed for:
+> **Audience:** cuOpt contributors and advanced integrators debugging the server.
+>
+> End users should start with the cuOpt documentation **gRPC remote execution** section — Quick start, **Advanced configuration** (flags, TLS, Docker, client env vars), and the short **gRPC server behavior** overview (`docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md` in this repository). Those pages intentionally omit the C++-level detail below.
+
+The NVIDIA cuOpt gRPC server (`cuopt_grpc_server`) is a multi-process architecture designed for:
 - **Isolation**: Each solve runs in a separate worker process for fault tolerance
 - **Parallelism**: Multiple workers can process jobs concurrently
 - **Large Payloads**: Handles multi-GB problems and solutions
 - **Real-Time Feedback**: Log streaming and incumbent callbacks during solve
 
-For gRPC protocol and client API, see `GRPC_INTERFACE.md`. Server source files live under `cpp/src/grpc/server/`.
+Server source files live under `cpp/src/grpc/server/`.
 
 ## Process Model
 
@@ -229,7 +236,7 @@ The `StreamLogs` RPC:
 ```bash
 cuopt_grpc_server [options]
 
-  -p, --port PORT              gRPC listen port (default: 8765)
+  -p, --port PORT              gRPC listen port (default: 5001)
   -w, --workers NUM            Number of worker processes (default: 1)
       --max-message-mb N       Max gRPC message size in MiB (default: 256; clamped to [4 KiB, ~2 GiB])
       --max-message-bytes N    Max gRPC message size in bytes (exact; min 4096)
@@ -245,6 +252,20 @@ TLS Options:
       --require-client-cert    Require client certificate (mTLS)
 ```
 
+### NVIDIA cuOpt container image
+
+When you use the official NVIDIA cuOpt container **without** an explicit command, the entrypoint chooses between the Python REST server and `cuopt_grpc_server`. User-facing Docker and client configuration is documented in `docs/cuopt/source/cuopt-grpc/advanced.rst` in this repository (the published **Advanced configuration** page).
+
+When **`CUOPT_SERVER_TYPE=grpc`**, the entrypoint maps:
+
+| Variable | Role |
+|----------|------|
+| `CUOPT_SERVER_PORT` | Passed as `--port` (default `5001`). |
+| `CUOPT_GPU_COUNT` | When set, passed as `--workers`. When unset, `--workers` is omitted and the server uses its default worker count. |
+| `CUOPT_GRPC_ARGS` | Optional whitespace-separated **extra** `cuopt_grpc_server` flags (TLS, message limits, logging, and so on). Each token becomes one argv word; embedded spaces inside a single flag value are not supported through this variable—invoke `cuopt_grpc_server` directly if you need complex quoting. |
+
+Any flag listed in *Configuration options* above can be supplied on the host CLI or inside `CUOPT_GRPC_ARGS`.
+
 ## Fault Tolerance
 
 ### Worker Crashes
diff --git a/cpp/src/grpc/client/grpc_client.hpp b/cpp/src/grpc/client/grpc_client.hpp
index f8579b3271..58a40f5ebe 100644
--- a/cpp/src/grpc/client/grpc_client.hpp
+++ b/cpp/src/grpc/client/grpc_client.hpp
@@ -52,7 +52,7 @@ void grpc_test_mark_as_connected(class grpc_client_t& client);
  * - Result retrieval uses chunked download for results exceeding max_message_bytes.
  */
 struct grpc_client_config_t {
-  std::string server_address = "localhost:8765";
+  std::string server_address = "localhost:5001";
   int poll_interval_ms       = 1000;   // How often to poll for job status
   int timeout_seconds        = 0;      // Max time to wait for job completion (0 = no limit)
   bool stream_logs           = false;  // Whether to stream logs from server
@@ -204,7 +204,7 @@ struct remote_mip_result_t {
  *
  * Usage:
  * @code
- * grpc_client_t client("localhost:8765");
+ * grpc_client_t client("localhost:5001");
  * if (!client.connect()) { ... handle error ... }
  *
  * auto result = client.solve_lp(problem, settings);
diff --git a/cpp/src/grpc/server/grpc_server_main.cpp b/cpp/src/grpc/server/grpc_server_main.cpp
index 5cc947a81a..cb73469cc6 100644
--- a/cpp/src/grpc/server/grpc_server_main.cpp
+++ b/cpp/src/grpc/server/grpc_server_main.cpp
@@ -65,7 +65,7 @@ int main(int argc, char** argv)
 
   argparse::ArgumentParser program("cuopt_grpc_server", version_string);
 
-  program.add_argument("-p", "--port").help("Listen port").default_value(8765).scan<'i', int>();
+  program.add_argument("-p", "--port").help("Listen port").default_value(5001).scan<'i', int>();
 
   program.add_argument("-w", "--workers")
     .help("Number of worker processes")
diff --git a/cpp/src/grpc/server/grpc_server_types.hpp b/cpp/src/grpc/server/grpc_server_types.hpp
index 7afc668fb9..04fbce4a93 100644
--- a/cpp/src/grpc/server/grpc_server_types.hpp
+++ b/cpp/src/grpc/server/grpc_server_types.hpp
@@ -156,7 +156,7 @@ struct JobWaiter {
 // =============================================================================
 
 struct ServerConfig {
-  int port            = 8765;
+  int port            = 5001;
   int num_workers     = 1;
   bool verbose        = true;
   bool log_to_console = false;
diff --git a/docs/cuopt/source/_static/install-selector.js b/docs/cuopt/source/_static/install-selector.js
index d0d309b897..39da616f8e 100644
--- a/docs/cuopt/source/_static/install-selector.js
+++ b/docs/cuopt/source/_static/install-selector.js
@@ -36,6 +36,30 @@
   var V_CONDA_NEXT = nextMajor + "." + (nextMinor < 10 ? "0" : "") + nextMinor;
   var V_NEXT = nextMajor + "." + nextMinor;
 
+  /* Shared Docker image lines: same tags are typically published to Docker Hub and NGC */
+  var CONTAINER_CUOPT_LIB = {
+    stable: {
+      cu12: {
+        default: "docker pull nvidia/cuopt:latest-cuda12.9-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda12.9-py3.13 /bin/bash",
+      },
+      cu13: {
+        default: "docker pull nvidia/cuopt:latest-cuda13.0-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda13.0-py3.13 /bin/bash",
+      },
+    },
+    nightly: {
+      cu12: {
+        default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13 /bin/bash",
+      },
+      cu13: {
+        default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13",
+        run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13 /bin/bash",
+      },
+    },
+  };
+
   var COMMANDS = {
     python: {
       pip: {
@@ -82,28 +106,7 @@
             ".* cuda-version=13.0",
         },
       },
-      container: {
-        stable: {
-          cu12: {
-            default: "docker pull nvidia/cuopt:latest-cuda12.9-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda12.9-py3.13 /bin/bash",
-          },
-          cu13: {
-            default: "docker pull nvidia/cuopt:latest-cuda13.0-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:latest-cuda13.0-py3.13 /bin/bash",
-          },
-        },
-        nightly: {
-          cu12: {
-            default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda12.9-py3.13 /bin/bash",
-          },
-          cu13: {
-            default: "docker pull nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13",
-            run: "docker run --gpus all -it --rm nvidia/cuopt:" + V_NEXT + ".0a-cuda13.0-py3.13 /bin/bash",
-          },
-        },
-      },
+      container: CONTAINER_CUOPT_LIB,
     },
     c: {
       pip: {
@@ -150,7 +153,7 @@
             ".* cuda-version=13.0",
         },
       },
-      container: null,
+      container: CONTAINER_CUOPT_LIB,
     },
     server: {
       pip: {
@@ -228,9 +231,9 @@
 
   var SUPPORTED_METHODS = {
     python: ["pip", "conda", "container"],
-    c: ["pip", "conda"],
+    c: ["pip", "conda", "container"],
     server: ["pip", "conda", "container"],
-    cli: ["pip", "conda"],
+    cli: ["pip", "conda", "container"],
   };
 
   function getSelectedValue(name) {
@@ -264,7 +267,41 @@
     if (method === "container") {
       var cudaKey = cuda || "cu12";
       var c = data[release][cudaKey] || data[release].cu12;
-      cmd = c.default + "\n\n# Run the container:\n" + c.run;
+      var hubPull = c.default;
+      var tag = "latest-cuda12.9-py3.13";
+      var tm = hubPull.match(/docker pull nvidia\/cuopt:(\S+)/);
+      if (tm) tag = tm[1];
+      var registry = getSelectedValue("cuopt-registry") || "hub";
+      var runLine = c.run;
+      if (registry === "ngc" && release === "nightly") {
+        cmd =
+          "# Nightly cuOpt container images are not published to NVIDIA NGC; use Docker Hub for nightly builds.\n" +
+          "# (Select \"Docker Hub\" above for the same commands without this note.)\n\n" +
+          "# Docker Hub (docker.io) — no registry login required for public pulls\n" +
+          hubPull +
+          "\n\n" +
+          "# Run the container:\n" +
+          runLine;
+      } else if (registry === "ngc") {
+        runLine = runLine.replace(/nvidia\/cuopt:/g, "nvcr.io/nvidia/cuopt/cuopt:");
+        cmd =
+          "# NVIDIA NGC (nvcr.io) — authenticate once per session, then pull:\n" +
+          "docker login nvcr.io\n" +
+          "# Username: $oauthtoken\n" +
+          "# Password: <NGC API key>\n\n" +
+          "docker pull nvcr.io/nvidia/cuopt/cuopt:" +
+          tag +
+          "\n\n" +
+          "# Run the container:\n" +
+          runLine;
+      } else {
+        cmd =
+          "# Docker Hub (docker.io) — no registry login required for public pulls\n" +
+          hubPull +
+          "\n\n" +
+          "# Run the container:\n" +
+          runLine;
+      }
     } else {
       var key = data[release].cu12 && data[release].cu13 ? cuda : "default";
       cmd = data[release][key] || data[release].cu12 || data[release].cu13 || data[release].default || "";
@@ -302,9 +339,17 @@
     var cudaRow = document.getElementById("cuopt-cuda-row");
     var releaseRow = document.getElementById("cuopt-release-row");
     var releaseVisible = iface !== "cli";
-    var showCuda = releaseVisible && (method === "pip" || method === "conda" || method === "container") && hasCudaVariants(iface, method);
+    var ifaceForVariants = iface === "cli" ? "c" : iface;
+    var showCuda =
+      releaseVisible &&
+      (method === "pip" || method === "conda" || method === "container") &&
+      hasCudaVariants(ifaceForVariants, method);
     cudaRow.style.display = showCuda ? "table-row" : "none";
     releaseRow.style.display = releaseVisible ? "table-row" : "none";
+    var registryRow = document.getElementById("cuopt-registry-row");
+    if (registryRow) {
+      registryRow.style.display = method === "container" ? "table-row" : "none";
+    }
     updateOutput();
   }
 
@@ -350,13 +395,17 @@
       '<label class="cuopt-opt"><input type="radio" name="cuopt-cuda" value="cu12" checked> 12.x</label>' +
       '<label class="cuopt-opt"><input type="radio" name="cuopt-cuda" value="cu13"> 13.x</label>' +
       '</td></tr>' +
+      '<tr id="cuopt-registry-row" style="display:none;"><td class="cuopt-opt-label">Registry</td><td class="cuopt-opt-group" role="group" aria-label="Container registry">' +
+      '<label class="cuopt-opt"><input type="radio" name="cuopt-registry" value="hub" checked> Docker Hub</label>' +
+      '<label class="cuopt-opt"><input type="radio" name="cuopt-registry" value="ngc"> NVIDIA NGC</label>' +
+      '</td></tr>' +
       "</table>" +
       '<div class="cuopt-install-output">' +
       '<textarea id="cuopt-cmd-out" class="cuopt-install-cmd-out" readonly rows="6" style="display:none;"></textarea>' +
       '<div class="cuopt-install-copy-wrap"><button type="button" id="cuopt-copy-btn" class="cuopt-install-copy-btn" style="display:none;">Copy command</button></div>' +
       "</div></div>";
 
-    ["cuopt-iface", "cuopt-method", "cuopt-release", "cuopt-cuda"].forEach(
+    ["cuopt-iface", "cuopt-method", "cuopt-release", "cuopt-cuda", "cuopt-registry"].forEach(
       function (name) {
         var inputs = document.querySelectorAll('input[name="' + name + '"]');
         inputs.forEach(function (input) {
diff --git a/docs/cuopt/source/cuopt-grpc/advanced.rst b/docs/cuopt/source/cuopt-grpc/advanced.rst
new file mode 100644
index 0000000000..320beca122
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/advanced.rst
@@ -0,0 +1,312 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+=======================
+Advanced configuration
+=======================
+
+This page lists **configuration parameters** first, then **usage** walkthroughs (TLS, Docker, private CA). Complete :doc:`quick-start` first (install, plain TCP server, and minimal example).
+
+For RPC summaries and server behavior, see :doc:`api` and :doc:`grpc-server-architecture`. Example entry points with ``CUOPT_REMOTE_*``: :doc:`examples`. Contributor-only internals: ``cpp/docs/grpc-server-architecture.md`` in the repository.
+
+Configuration parameters
+========================
+
+``cuopt_grpc_server`` (host or explicit container command)
+------------------------------------------------------------
+
+Run ``cuopt_grpc_server --help`` for the full list. Typical flags (also passable inside ``CUOPT_GRPC_ARGS`` when using the container entrypoint):
+
+.. code-block:: text
+
+   cuopt_grpc_server [options]
+
+     -p, --port PORT              gRPC listen port (default: 5001)
+     -w, --workers NUM            Number of worker processes (default: 1)
+         --max-message-mb N       Max gRPC message size in MiB (default: 256; clamped to [4 KiB, ~2 GiB])
+         --max-message-bytes N    Max gRPC message size in bytes (exact; min 4096)
+         --enable-transfer-hash   Log data hashes for streaming transfers (for testing)
+         --log-to-console         Echo solver logs to server console
+     -q, --quiet                  Reduce verbosity (verbose is the default)
+
+   TLS Options:
+         --tls                    Enable TLS encryption
+         --tls-cert PATH          Server certificate (PEM)
+         --tls-key PATH           Server private key (PEM)
+         --tls-root PATH          Root CA certificate (for client verification)
+         --require-client-cert    Require client certificate (mTLS)
+
+NVIDIA cuOpt container (gRPC via entrypoint)
+--------------------------------------------
+
+These variables apply when the container **entrypoint** builds a ``cuopt_grpc_server`` command (see *Docker: gRPC server in container* under Usage). If you pass an explicit command after the image name, this table does not apply.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 22 18 60
+
+   * - Variable
+     - Default
+     - Description
+   * - ``CUOPT_SERVER_TYPE``
+     - *(unset)*
+     - Set to ``grpc`` for entrypoint-built gRPC. Unset with no explicit command: **Python REST** server.
+   * - ``CUOPT_SERVER_PORT``
+     - ``5001``
+     - Passed as ``--port`` to ``cuopt_grpc_server``.
+   * - ``CUOPT_GPU_COUNT``
+     - *(unset)*
+     - When set, passed as ``--workers``. When unset, ``--workers`` is omitted (server default, typically 1).
+   * - ``CUOPT_GRPC_ARGS``
+     - *(empty)*
+     - Extra flags split on **whitespace** and appended (TLS, ``--max-message-mb``, ``--log-to-console``, etc.). Paths with spaces: prefer mounts without spaces or run ``cuopt_grpc_server`` manually with proper quoting.
+
+The REST server path in the same image still uses ``CUOPT_SERVER_PORT`` for HTTP in other docs; that is separate from the gRPC defaults above.
+
+Bundled remote client (Python, C API, ``cuopt_cli``)
+----------------------------------------------------
+
+Remote mode is active when **both** ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` are set. A **custom** gRPC client does not read these automatically; it must configure the channel and protos itself (see :doc:`api`).
+
+.. list-table::
+   :header-rows: 1
+   :widths: 26 14 18 42
+
+   * - Variable
+     - Required
+     - Default
+     - Description
+   * - ``CUOPT_REMOTE_HOST``
+     - For remote
+     - —
+     - Server hostname or IP
+   * - ``CUOPT_REMOTE_PORT``
+     - For remote
+     - —
+     - Server port (e.g. ``5001``)
+   * - ``CUOPT_TLS_ENABLED``
+     - No
+     - ``0``
+     - Non-zero enables TLS on the client
+   * - ``CUOPT_TLS_ROOT_CERT``
+     - If TLS
+     - —
+     - PEM path to verify the **server** certificate
+   * - ``CUOPT_TLS_CLIENT_CERT``
+     - mTLS
+     - —
+     - Client certificate PEM
+   * - ``CUOPT_TLS_CLIENT_KEY``
+     - mTLS
+     - —
+     - Client private key PEM
+   * - ``CUOPT_CHUNK_SIZE``
+     - No
+     - 16 MiB (lib)
+     - Chunk size in **bytes** for large transfers (clamped in library code)
+   * - ``CUOPT_MAX_MESSAGE_BYTES``
+     - No
+     - 256 MiB (lib)
+     - Client gRPC max message size in **bytes** (clamped in library code)
+   * - ``CUOPT_GRPC_DEBUG``
+     - No
+     - ``0``
+     - Non-zero: extra gRPC client logging
+
+Usage
+=====
+
+Start the server with TLS
+--------------------------
+
+Basic (no TLS), plain TCP, is in :doc:`quick-start`. Encrypted server:
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 \
+     --tls \
+     --tls-cert server.crt \
+     --tls-key server.key
+
+mTLS (mutual TLS):
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 \
+     --tls \
+     --tls-cert server.crt \
+     --tls-key server.key \
+     --tls-root ca.crt \
+     --require-client-cert
+
+How mTLS works
+--------------
+
+With mTLS the server verifies every client, and the client verifies the server. Trust is based on **Certificate Authorities** (CAs), not individual certificate lists:
+
+* ``--tls-root ca.crt`` tells the server which CA to trust; any client cert signed by that CA is accepted. The server does not store per-client certificates.
+* ``--require-client-cert`` makes client verification **mandatory**. Without it, the server may still allow connections without a client cert.
+* On the client, ``CUOPT_TLS_ROOT_CERT`` is the CA that signed the **server** certificate so the client can verify the server.
+
+Restricting access with a private CA
+------------------------------------
+
+To limit which clients can connect, run your own CA and issue client certs only to authorized actors.
+
+**1. Create a private CA (one-time):**
+
+.. code-block:: bash
+
+   openssl genrsa -out ca.key 4096
+   openssl req -new -x509 -key ca.key -sha256 -days 3650 \
+     -subj "/CN=cuopt-internal-ca" -out ca.crt
+
+**2. Issue a client certificate:**
+
+.. code-block:: bash
+
+   openssl genrsa -out client.key 2048
+   openssl req -new -key client.key \
+     -subj "/CN=team-member-alice" -out client.csr
+   openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key \
+     -CAcreateserial -days 365 -sha256 -out client.crt
+
+Repeat for each authorized client. Keep ``ca.key`` private; distribute ``ca.crt`` to the server and per-client ``client.crt`` + ``client.key`` pairs.
+
+**3. Issue a server certificate (same CA):**
+
+.. code-block:: bash
+
+   openssl genrsa -out server.key 2048
+   openssl req -new -key server.key \
+     -subj "/CN=server.example.com" -out server.csr
+
+   cat > server.ext <<EOF
+   subjectAltName=DNS:server.example.com,DNS:localhost,IP:127.0.0.1
+   EOF
+
+   openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
+     -CAcreateserial -days 365 -sha256 -extfile server.ext -out server.crt
+
+``server.crt`` must be signed by the CA you give to clients, and **subjectAltName** must match the hostname or IP clients use. gRPC hostname verification expects SAN; **CN alone is not sufficient**.
+
+**4. Start the server:**
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 \
+     --tls \
+     --tls-cert server.crt \
+     --tls-key server.key \
+     --tls-root ca.crt \
+     --require-client-cert
+
+**5. Configure an authorized client:**
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=server.example.com
+   export CUOPT_REMOTE_PORT=5001
+   export CUOPT_TLS_ENABLED=1
+   export CUOPT_TLS_ROOT_CERT=ca.crt
+   export CUOPT_TLS_CLIENT_CERT=client.crt
+   export CUOPT_TLS_CLIENT_KEY=client.key
+
+**Revocation:** built-in gRPC TLS does **not** implement CRL or OCSP. To revoke a client, rotate the CA, stop issuing from a compromised CA, or terminate TLS at a reverse proxy (e.g., Envoy) that supports revocation.
+
+Docker: gRPC server in container
+---------------------------------
+
+The official NVIDIA cuOpt image includes the REST server and ``cuopt_grpc_server``. The entrypoint behaves as follows:
+
+1. **Explicit command** after the image name (e.g. ``cuopt_grpc_server …``) runs as-is; env-based gRPC wiring is skipped.
+2. **`CUOPT_SERVER_TYPE=grpc`** builds a ``cuopt_grpc_server`` command from the **NVIDIA cuOpt container** table in *Configuration parameters*.
+3. **Default** — if ``CUOPT_SERVER_TYPE`` is unset and there is no explicit command, the Python **REST** server starts.
+
+.. note::
+
+   Examples use ``--gpus all``. That requires NVIDIA GPUs on the host and Docker with the `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ so devices are visible inside the container.
+
+Typical run:
+
+.. code-block:: bash
+
+   docker run --gpus all -p 5001:5001 \
+     -e CUOPT_SERVER_TYPE=grpc \
+     nvcr.io/nvidia/cuopt/cuopt:latest
+
+TLS example with a cert volume:
+
+.. code-block:: bash
+
+   docker run --gpus all -p 5001:5001 \
+     -e CUOPT_SERVER_TYPE=grpc \
+     -e CUOPT_GRPC_ARGS="--tls --tls-cert /certs/server.crt --tls-key /certs/server.key --log-to-console" \
+     -v ./certs:/certs:ro \
+     nvcr.io/nvidia/cuopt/cuopt:latest
+
+Bypass the entrypoint:
+
+.. code-block:: bash
+
+   docker run --gpus all -p 5001:5001 \
+     nvcr.io/nvidia/cuopt/cuopt:latest \
+     cuopt_grpc_server --port 5001 --workers 2
+
+Client environment (examples)
+------------------------------
+
+**Required** for remote (see *Bundled remote client* table for all variables):
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=<server-hostname>
+   export CUOPT_REMOTE_PORT=5001
+
+**TLS** (optional):
+
+.. code-block:: bash
+
+   export CUOPT_TLS_ENABLED=1
+   export CUOPT_TLS_ROOT_CERT=ca.crt
+
+For mTLS, also:
+
+.. code-block:: bash
+
+   export CUOPT_TLS_CLIENT_CERT=client.crt
+   export CUOPT_TLS_CLIENT_KEY=client.key
+
+Limitations and scope
+=====================
+
+* **Problem types** — **LP**, **MILP**, and **QP** are supported on the gRPC remote path. **Routing** (VRP, TSP, PDP) is **not** supported yet; use the :doc:`REST self-hosted server <../cuopt-server/index>` for remote routing until a future release adds routing over ``CuOptRemoteService``.
+* **Message size** — Large problems use chunking; very large models can still hit gRPC max message / timeout limits. Tune ``CUOPT_CHUNK_SIZE``, ``CUOPT_MAX_MESSAGE_BYTES``, server ``--max-message-mb``, and solver ``time_limit`` as needed.
+* **``CUOPT_GRPC_ARGS``** — Parsed on whitespace only; arguments containing spaces are awkward unless you invoke ``cuopt_grpc_server`` directly.
+* **CRL / OCSP** — Not handled by the bundled gRPC TLS stack; use a private CA rotation strategy or a TLS-terminating proxy if you need revocation workflows.
+
+Troubleshooting
+===============
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - Symptom
+     - Check
+   * - Connection refused
+     - Server running; host/port match; firewalls and Docker port mapping.
+   * - TLS handshake failure
+     - ``CUOPT_TLS_ENABLED=1``; correct CA and cert paths; SAN matches server name.
+   * - Cannot open TLS file
+     - Path exists and is readable inside the client/server environment (including container mounts).
+   * - Timeout on large problems
+     - Increase solver ``time_limit`` and client/server message limits.
+
+Further reading
+===============
+
+* :doc:`quick-start` — Plain TCP quick path.
+* :doc:`examples` — Links to Python, C, and CLI example sections (use with ``CUOPT_REMOTE_*`` on the client).
+* :doc:`grpc-server-architecture` — Process model and job behavior (operator overview).
diff --git a/docs/cuopt/source/cuopt-grpc/api.rst b/docs/cuopt/source/cuopt-grpc/api.rst
new file mode 100644
index 0000000000..3d44857b7a
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/api.rst
@@ -0,0 +1,98 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+======================
+gRPC API (reference)
+======================
+
+The **CuOptRemoteService** gRPC API is defined in Protocol Buffers under the ``cuopt.remote`` package. Source files in the repository:
+
+* ``cpp/src/grpc/cuopt_remote_service.proto`` — service and job/chunk/log RPCs
+* ``cpp/src/grpc/cuopt_remote.proto`` — LP/MIP problem, settings, and result messages
+
+Most users do **not** call these RPCs directly: the NVIDIA cuOpt **Python** API, **C API**, and **cuopt_cli** submit jobs using solver APIs plus :doc:`environment variables <advanced>`. **Custom** clients call ``CuOptRemoteService`` over gRPC using these definitions. This page summarizes the service for custom integrators and debugging.
+
+Service: ``CuOptRemoteService``
+================================
+
+Asynchronous jobs
+-----------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``SubmitJob``
+     - Submit an LP or MILP job in one message (within gRPC message size limits).
+   * - ``CheckStatus``
+     - Poll job status by ``job_id``.
+   * - ``GetResult``
+     - Fetch a completed result (unary, when the payload fits one message).
+   * - ``DeleteResult``
+     - Remove a stored result from server memory.
+   * - ``CancelJob``
+     - Cancel a queued or running job.
+   * - ``WaitForCompletion``
+     - Block until the job finishes (status only; use ``GetResult`` for the solution).
+
+Chunked upload (large problems)
+--------------------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``StartChunkedUpload``
+     - Begin a session; send problem metadata and settings (arrays follow as chunks).
+   * - ``SendArrayChunk``
+     - Upload one slice of a numeric array field.
+   * - ``FinishChunkedUpload``
+     - Finalize the upload and return ``job_id`` (same as ``SubmitJob``).
+
+Chunked download (large results)
+--------------------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``StartChunkedDownload``
+     - Begin a download session; returns scalar result fields and array descriptors.
+   * - ``GetResultChunk``
+     - Fetch one chunk of a result array.
+   * - ``FinishChunkedDownload``
+     - End the download session and release server state.
+
+Streaming and callbacks
+-----------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 28 72
+
+   * - RPC
+     - Purpose
+   * - ``StreamLogs``
+     - Server-streaming solver log lines for a job.
+   * - ``GetIncumbents``
+     - MILP incumbent solutions since a given index.
+
+Messages and constraints
+========================
+
+* **Problem types** — LP and MILP in the enum; the problem payload can include quadratic objective data for **QP**-style solves where the client API supports it. **Routing** over this gRPC service is **not** available yet; it is planned for an **upcoming** release (use REST for remote routing today).
+* **Solver settings** — Carried as ``PDLPSolverSettings`` or ``MIPSolverSettings`` inside the request or chunked header, aligned with the NVIDIA cuOpt solver options documentation.
+* **Errors** — gRPC status codes carry failures (see comments at the end of ``cuopt_remote_service.proto``).
+
+Further reading
+===============
+
+* :doc:`grpc-server-architecture` — Server process model and job lifecycle (overview); :doc:`advanced` for ``cuopt_grpc_server`` flags. Contributor details: ``cpp/docs/grpc-server-architecture.md``.
+* :doc:`advanced` — TLS, Docker, client environment variables, and limitations.
diff --git a/docs/cuopt/source/cuopt-grpc/examples.rst b/docs/cuopt/source/cuopt-grpc/examples.rst
new file mode 100644
index 0000000000..cf37ba08f5
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/examples.rst
@@ -0,0 +1,66 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+========
+Examples
+========
+
+gRPC remote execution uses the same **Python**, **C API**, and **cuopt_cli** entry points as a local solve. After you start ``cuopt_grpc_server`` on the GPU host (:doc:`quick-start`), set the client environment and run **any** of the examples below **unchanged** — no code edits are required.
+
+On the **client** host, before running the example commands or scripts:
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=<gpu-hostname-or-ip>
+   export CUOPT_REMOTE_PORT=5001
+
+Add TLS or tuning variables from :doc:`advanced` if your deployment uses them.
+
+.. note::
+
+   Routing over gRPC is planned for an upcoming release. For remote routing today, use the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>` and :doc:`Examples <../cuopt-server/examples/index>`.
+
+Where to find examples
+======================
+
+Python (LP / QP / MILP)
+-----------------------
+
+* :doc:`../cuopt-python/lp-qp-milp/lp-qp-milp-examples` — runnable Python samples (LP, QP, MILP). With ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` set on the client, solves go to the remote server automatically.
+
+C API (LP / QP / MILP)
+----------------------
+
+* :doc:`../cuopt-c/lp-qp-milp/lp-qp-example` — LP and QP C examples.
+* :doc:`../cuopt-c/lp-qp-milp/milp-examples` — MILP C examples.
+
+  Compile and run these programs with the same exports in the shell; ``solve_lp`` / ``solve_mip`` use gRPC when both remote variables are set (see :doc:`../cuopt-c/lp-qp-milp/lp-qp-milp-c-api` for API reference).
+
+``cuopt_cli``
+-------------
+
+* :doc:`../cuopt-cli/cli-examples` — ``cuopt_cli`` invocations. With the exports above, the CLI forwards solves to ``cuopt_grpc_server``.
+
+Minimal demos (this section)
+----------------------------
+
+Bundled with the gRPC docs source for a quick copy-paste path (also walked through in :doc:`quick-start`):
+
+* :download:`remote_lp_demo.py <examples/remote_lp_demo.py>`
+* :download:`remote_lp_demo.mps <examples/remote_lp_demo.mps>`
+
+Custom gRPC client
+------------------
+
+Integrations that do **not** use the bundled Python / C / CLI stack should speak ``CuOptRemoteService`` directly. See :doc:`api`, :doc:`grpc-server-architecture`, and ``cpp/docs/grpc-server-architecture.md`` in the repository for protos and server behavior.
+
+More samples
+============
+
+* `NVIDIA cuOpt examples on GitHub <https://github.com/NVIDIA/cuopt-examples>`_ — set the remote environment on the **client** before running notebooks or scripts.
+
+REST vs gRPC
+============
+
+* **Self-hosted HTTP/JSON** — :doc:`../cuopt-server/examples/index` targets the REST server; request shapes follow the OpenAPI workflow, not the ``CuOptRemoteService`` protos.
diff --git a/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps
new file mode 100644
index 0000000000..95d342250c
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps
@@ -0,0 +1,13 @@
+NAME   good-1
+ROWS
+ N  COST
+ L  ROW1
+ L  ROW2
+COLUMNS
+   VAR1      COST      -0.2
+   VAR1      ROW1      3              ROW2      2.7
+   VAR2      COST      0.1
+   VAR2      ROW1      4              ROW2      10.1
+RHS
+   RHS1      ROW1      5.4            ROW2      4.9
+ENDATA
diff --git a/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py
new file mode 100644
index 0000000000..4b24938c6c
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py
@@ -0,0 +1,37 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""Minimal LP demo for NVIDIA cuOpt gRPC remote execution.
+
+Set CUOPT_REMOTE_HOST and CUOPT_REMOTE_PORT on the client before running to forward
+the solve to cuopt_grpc_server; unset them to solve locally (GPU required locally).
+
+The same LP is available as MPS in ``remote_lp_demo.mps`` for ``cuopt_cli``.
+"""
+
+import numpy as np
+from cuopt import linear_programming
+
+dm = linear_programming.DataModel()
+A_values = np.array([3.0, 4.0, 2.7, 10.1], dtype=np.float64)
+A_indices = np.array([0, 1, 0, 1], dtype=np.int32)
+A_offsets = np.array([0, 2, 4], dtype=np.int32)
+dm.set_csr_constraint_matrix(A_values, A_indices, A_offsets)
+
+b = np.array([5.4, 4.9], dtype=np.float64)
+dm.set_constraint_bounds(b)
+
+c = np.array([0.2, 0.1], dtype=np.float64)
+dm.set_objective_coefficients(c)
+
+dm.set_row_types(np.array(["L", "L"]))
+
+dm.set_variable_lower_bounds(np.array([0.0, 0.0], dtype=np.float64))
+dm.set_variable_upper_bounds(np.array([2.0, np.inf], dtype=np.float64))
+
+settings = linear_programming.SolverSettings()
+solution = linear_programming.Solve(dm, settings)
+
+print("Termination:", solution.get_termination_reason())
+print("Objective:  ", solution.get_primal_objective())
+print("Primal x:   ", solution.get_primal_solution())
diff --git a/docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md b/docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md
new file mode 100644
index 0000000000..7508103dc7
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/grpc-server-architecture.md
@@ -0,0 +1,78 @@
+# gRPC server behavior
+
+NVIDIA cuOpt's **`cuopt_grpc_server`** uses one **main process** (gRPC front end, job tracking, background threads) and **worker processes** that run GPU solves. That layout gives isolation between jobs, optional parallelism when you set multiple workers, and streaming for large problems and logs.
+
+Implementation details (IPC layout, C++ source map, chunked transfer internals) live in the contributor reference: **`cpp/docs/grpc-server-architecture.md`** in the NVIDIA cuOpt repository.
+
+## Process model
+
+```text
+┌──────────────────────────────────────────────────────────────────────┐
+│                        Main Server Process                           │
+│                                                                      │
+│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────────────┐  │
+│  │  gRPC       │  │  Job         │  │  Background Threads         │  │
+│  │  Service    │  │  Tracker     │  │  - Result retrieval         │  │
+│  │  Handler    │  │  (job status,│  │  - Incumbent retrieval      │  │
+│  │             │  │   results)   │  │  - Worker monitor           │  │
+│  └─────────────┘  └──────────────┘  └─────────────────────────────┘  │
+│         │                                        ▲                   │
+│         │ shared memory                          │ pipes             │
+│         ▼                                        │                   │
+│  ┌─────────────────────────────────────────────────────────────────┐ │
+│  │                       Shared Memory Queues                      │ │
+│  │                                                                 │ │
+│  │   ┌─────────────────┐        ┌─────────────────────┐            │ │
+│  │   │  Job Queue      │        │  Result Queue       │            │ │
+│  │   │  (MAX_JOBS=100) │        │  (MAX_RESULTS=100)  │            │ │
+│  │   └─────────────────┘        └─────────────────────┘            │ │
+│  └─────────────────────────────────────────────────────────────────┘ │
+└──────────────────────────────────────────────────────────────────────┘
+               │                                        ▲           
+               │ fork()                                 │           
+               ▼                                        │           
+     ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐      
+     │  Worker 0       │  │  Worker 1       │  │  Worker N       │      
+     │  ┌───────────┐  │  │  ┌───────────┐  │  │  ┌───────────┐  │      
+     │  │ GPU Solve │  │  │  │ GPU Solve │  │  │  │ GPU Solve │  │      
+     │  └───────────┘  │  │  └───────────┘  │  │  └───────────┘  │      
+     │  (separate proc)│  │  (separate proc)│  │  (separate proc)│      
+     └─────────────────┘  └─────────────────┘  └─────────────────┘      
+```
+
+## Job lifecycle (summary)
+
+**Submit** → the server assigns a job id and queues work. **Process** → a worker pulls the problem, solves on the GPU, and streams the result back. **Retrieve** → the client uses status and result RPCs (including chunked download when needed). See [gRPC API (reference)](api.rst) for RPC names.
+
+## Job states
+
+```text
+┌─────────┐  submit   ┌───────────┐  claim   ┌────────────┐
+│ QUEUED  │──────────►│ PROCESSING│─────────►│ COMPLETED  │
+└─────────┘           └───────────┘          └────────────┘
+     │                      │
+     │ cancel               │ error
+     ▼                      ▼
+┌───────────┐          ┌─────────┐
+│ CANCELLED │          │ FAILED  │
+└───────────┘          └─────────┘
+```
+
+## Logs, capacity, and workers
+
+| Topic | Detail |
+|-------|--------|
+| Log files | Per-job solver logs under `/tmp/cuopt_logs/job_<job_id>.log` (used by log streaming). |
+| Default caps | Up to **100** queued jobs and **100** stored results (server compile-time limits). |
+| Workers | Aim for roughly **1–2 worker processes per GPU**; more workers can increase GPU memory contention. |
+
+## Fault tolerance and cancellation
+
+- If a **worker process crashes**, jobs it was running are marked **FAILED**; the server can spawn replacement workers (see contributor doc for details).
+- **`CancelJob`** is honored **before** the solve starts. If the solver has already started, the run continues to completion (**no mid-solve cancellation**).
+
+## Further reading
+
+- [Advanced configuration](advanced.rst) — `cuopt_grpc_server` **command-line flags**, TLS, Docker (`CUOPT_SERVER_TYPE`, `CUOPT_GRPC_ARGS`), and **client** environment variables (authoritative for operators).
+- [gRPC API (reference)](api.rst) — `CuOptRemoteService` RPC overview.
+- **Contributor reference** — `cpp/docs/grpc-server-architecture.md` in the repository (IPC, source files, streaming, threading).
diff --git a/docs/cuopt/source/cuopt-grpc/index.rst b/docs/cuopt/source/cuopt-grpc/index.rst
new file mode 100644
index 0000000000..738180f877
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/index.rst
@@ -0,0 +1,30 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+==========================
+gRPC remote execution
+==========================
+
+**NVIDIA cuOpt gRPC remote execution** runs optimization solves on a remote GPU host. Clients can be the **Python** API, **C API**, **`cuopt_cli`**, or a **custom** program that speaks ``CuOptRemoteService`` over gRPC. For Python, the C API, and ``cuopt_cli``, set ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` to forward solves to ``cuopt_grpc_server``.
+
+.. note::
+
+   **Problem types (gRPC remote):** LP, MILP, and QP are supported today. **Routing** (VRP, TSP, PDP, and related APIs) over gRPC remote execution is **not** available yet; support is planned for an **upcoming** release. For routing against a remote service today, use the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>`.
+
+This is **not** the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>` (FastAPI). REST is for arbitrary HTTP clients; gRPC is for the bundled remote client in NVIDIA cuOpt's native APIs.
+
+Start with :doc:`quick-start` (install selector, how remote execution works, Docker, and a minimal LP example). Use :doc:`advanced` for TLS, tuning, limitations, and troubleshooting; :doc:`examples` for additional patterns.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: In this section
+   :name: cuopt-grpc-contents
+
+   quick-start.rst
+   advanced.rst
+   examples.rst
+   api.rst
+   grpc-server-architecture.md
+
+See :doc:`../system-requirements` for GPU, CUDA, and OS requirements.
diff --git a/docs/cuopt/source/cuopt-grpc/quick-start.rst b/docs/cuopt/source/cuopt-grpc/quick-start.rst
new file mode 100644
index 0000000000..acd3b5b9b2
--- /dev/null
+++ b/docs/cuopt/source/cuopt-grpc/quick-start.rst
@@ -0,0 +1,156 @@
+..
+   SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+   SPDX-License-Identifier: Apache-2.0
+
+===========
+Quick start
+===========
+
+**NVIDIA cuOpt gRPC remote execution** runs LP, MILP, and QP solves on a **GPU host** while your **Python** code, **C API** program, **`cuopt_cli`**, or a **custom** client runs elsewhere. When you set ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT``, the bundled **Python**, **C API**, and **cuopt_cli** clients forward ``solve_lp`` / ``solve_mip`` to ``cuopt_grpc_server`` with **no code changes**. **Custom** clients call ``CuOptRemoteService`` directly (see :doc:`api`).
+
+.. note::
+
+   **Problem types (gRPC remote):** **LP**, **MILP**, and **QP** are supported today. **Routing** (VRP, TSP, PDP) over this path is **not** available yet; support is planned for an **upcoming** release. For remote routing today, use the HTTP/JSON :doc:`REST self-hosted server <../cuopt-server/index>`. This guide is **not** the REST server—see :doc:`../cuopt-server/index` for HTTP/JSON.
+
+How remote execution works
+==========================
+
+1. **GPU host** — Run ``cuopt_grpc_server`` (bare metal or in the official container) so it listens on a TCP port (default **5001**).
+2. **Client** — Install the NVIDIA cuOpt client libraries on the machine where you invoke the solver. Set ``CUOPT_REMOTE_HOST`` to that GPU host’s address and ``CUOPT_REMOTE_PORT`` to the listen port.
+3. **Solve** — Call the same APIs you would for a local solve. The client library opens a gRPC channel, streams the problem, and retrieves the result. Unset the two variables to solve **locally** again (local mode still needs a GPU on that machine where applicable).
+
+Install NVIDIA cuOpt
+====================
+
+Use the selector below on the **GPU server** and on **clients** that need Python, the C API, or ``cuopt_cli``. It is pre-set to **C (libcuopt)** because that bundle ships ``cuopt_grpc_server``, ``cuopt_cli``, and libraries together; switch to **Python** if you only need Python packages on a lightweight client.
+
+.. install-selector::
+   :default-iface: c
+
+Verify the server binary after install:
+
+.. code-block:: bash
+
+   cuopt_grpc_server --help
+
+For the same install selector with **Container** / registry choices (Docker Hub or NGC), see :doc:`../install`.
+
+Run the gRPC server (GPU host)
+==============================
+
+**Bare metal** — after activating the same environment you used to install NVIDIA cuOpt:
+
+.. code-block:: bash
+
+   cuopt_grpc_server --port 5001 --workers 1
+
+Leave the process running. Default port **5001**; change ``--port`` if needed and expose the same port on the client side.
+
+**Docker** — requires `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_ (or equivalent) on the host. Pull an image tag from :doc:`../install` or the **Container** row in the selector above; substitute ``<CUOPT_IMAGE>`` below.
+
+Entrypoint mode (recommended when you are not passing an explicit command):
+
+.. code-block:: bash
+
+   docker run --gpus all -it --rm -p 5001:5001 \
+     -e CUOPT_SERVER_TYPE=grpc \
+     <CUOPT_IMAGE>
+
+Or invoke the binary explicitly:
+
+.. code-block:: bash
+
+   docker run --gpus all -it --rm -p 5001:5001 \
+     <CUOPT_IMAGE> \
+     cuopt_grpc_server --port 5001 --workers 1
+
+.. note::
+
+   The container image defaults to the Python **REST** server when ``CUOPT_SERVER_TYPE`` is unset and you do not override the command; setting ``CUOPT_SERVER_TYPE=grpc`` selects ``cuopt_grpc_server``. Extra environment variables (``CUOPT_SERVER_PORT``, ``CUOPT_GPU_COUNT``, ``CUOPT_GRPC_ARGS``) and TLS are documented in :doc:`Advanced configuration <advanced>`.
+
+Point the client at the server
+==============================
+
+On the machine where you run Python, the C API, or ``cuopt_cli`` (use ``127.0.0.1`` if the server is on the same host):
+
+.. code-block:: bash
+
+   export CUOPT_REMOTE_HOST=<gpu-hostname-or-ip>
+   export CUOPT_REMOTE_PORT=5001
+
+Optional TLS and tuning variables are in :doc:`advanced`.
+
+Minimal Python example (LP)
+============================
+
+The script is the same for **local** or **remote** solves: with the exports above, the client library forwards to ``cuopt_grpc_server``; without them, the solve runs locally (where a GPU is available).
+
+:download:`remote_lp_demo.py <examples/remote_lp_demo.py>`
+
+.. literalinclude:: examples/remote_lp_demo.py
+   :language: python
+   :linenos:
+
+Run the script from your NVIDIA cuOpt Python environment. From a **repository checkout** (repo root):
+
+.. code-block:: bash
+
+   python docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.py
+
+Or, after :download:`downloading <examples/remote_lp_demo.py>` the file into your current directory:
+
+.. code-block:: bash
+
+   python remote_lp_demo.py
+
+You should see an optimal termination. To solve **locally**, unset the remote variables and rerun with the **same** path you used above:
+
+.. code-block:: bash
+
+   unset CUOPT_REMOTE_HOST CUOPT_REMOTE_PORT
+   python remote_lp_demo.py
+
+Minimal ``cuopt_cli`` example (LP)
+==================================
+
+The same **LP** is available as MPS. With ``CUOPT_REMOTE_HOST`` and ``CUOPT_REMOTE_PORT`` set as above, ``cuopt_cli`` forwards the solve to the remote server; unset them for a **local** run (GPU on that machine).
+
+:download:`remote_lp_demo.mps <examples/remote_lp_demo.mps>`
+
+.. literalinclude:: examples/remote_lp_demo.mps
+   :language: text
+
+From a **repository checkout** (repo root):
+
+.. code-block:: bash
+
+   cuopt_cli docs/cuopt/source/cuopt-grpc/examples/remote_lp_demo.mps
+
+Or, after :download:`downloading <examples/remote_lp_demo.mps>` the MPS into your current directory:
+
+.. code-block:: bash
+
+   cuopt_cli remote_lp_demo.mps
+
+To solve **locally** with the same file:
+
+.. code-block:: bash
+
+   unset CUOPT_REMOTE_HOST CUOPT_REMOTE_PORT
+   cuopt_cli remote_lp_demo.mps
+
+More options (time limits, relaxation): :doc:`../cuopt-cli/quick-start` and :doc:`examples`.
+
+**C API** — With the same environment variables set, call ``solve_lp`` / ``solve_mip`` as in :doc:`../cuopt-c/lp-qp-milp/lp-qp-milp-c-api`.
+
+More patterns (MPS variants, custom gRPC): :doc:`examples`.
+
+Next steps
+==========
+
+* :doc:`../install` — Top-level install selector (all interfaces), including **Container** pulls.
+* :doc:`advanced` — TLS / mTLS, Docker environment reference, tuning, limitations, troubleshooting.
+* :doc:`examples` — Additional client examples and links to LP/MILP sample collections.
+* :doc:`api` and :doc:`grpc-server-architecture` — RPC summary and server behavior overview.
+
+See :doc:`../system-requirements` for GPU, CUDA, and OS requirements.
diff --git a/docs/cuopt/source/cuopt-server/index.rst b/docs/cuopt/source/cuopt-server/index.rst
index 36ea1ad8c3..0d9c7a277f 100644
--- a/docs/cuopt/source/cuopt-server/index.rst
+++ b/docs/cuopt/source/cuopt-server/index.rst
@@ -1,14 +1,15 @@
 Server
 ======
 
-NVIDIA cuOpt server is a REST API server that is built for the purpose of providing language agnostic access to the cuOpt optimization engine. Users can build their own clients in any language that supports HTTP requests or use cuopt-sh-client, a lightweight Python client, to communicate with the server.
+The **NVIDIA cuOpt self-hosted server** is a **REST** (HTTP/JSON) service for integrations that speak HTTP. Use :doc:`quick-start` for deployment, :doc:`server-api/index` for the API, and :doc:`client-api/index` for clients (including cuopt-sh-client).
+
+For **gRPC remote execution** (Python, C API, ``cuopt_cli``, or custom clients to ``cuopt_grpc_server``), see :doc:`../cuopt-grpc/index` — it uses a different protocol and is not part of the HTTP REST surface.
 
 .. image:: images/cuOpt-self-hosted.png
   :width: 500
   :align: center
 
-
-Please refer to following links for more information on API and examples:
+Please refer to the following sections for REST deployment, API reference, and examples.
 
 .. toctree::
    :caption: Quickstart
diff --git a/docs/cuopt/source/faq.rst b/docs/cuopt/source/faq.rst
index 1985052531..4c6350353b 100644
--- a/docs/cuopt/source/faq.rst
+++ b/docs/cuopt/source/faq.rst
@@ -156,6 +156,21 @@ General FAQ
 
         while openssl x509 -noout -text; do :; done < test.pem.txt
 
+gRPC remote execution (``cuopt_grpc_server``)
+-----------------------------------------------
+
+.. dropdown:: Where are log files for the gRPC server / StreamLogs?
+
+   Workers write per-job solver logs under ``/tmp/cuopt_logs/job_<job_id>.log``. The ``StreamLogs`` RPC tails that file. Operational limits and behavior are summarized in :doc:`gRPC server behavior <cuopt-grpc/grpc-server-architecture>`.
+
+.. dropdown:: What happens if a ``cuopt_grpc_server`` worker crashes?
+
+   Jobs that worker was running are marked **FAILED**. The server monitor can detect the crash and spawn a replacement worker; other workers keep running. For more detail, see :doc:`gRPC server behavior <cuopt-grpc/grpc-server-architecture>` and the contributor reference ``cpp/docs/grpc-server-architecture.md`` in the repository.
+
+.. dropdown:: Does ``CancelJob`` stop a solve immediately?
+
+   Cancellation is honored **before** the solver starts. If the solve has already begun, it **runs to completion**; there is no mid-solve cancellation path. See :doc:`gRPC server behavior <cuopt-grpc/grpc-server-architecture>`.
+
 Routing FAQ
 ------------------------------
 
diff --git a/docs/cuopt/source/index.rst b/docs/cuopt/source/index.rst
index e310c974ce..6f8ae3ba2c 100644
--- a/docs/cuopt/source/index.rst
+++ b/docs/cuopt/source/index.rst
@@ -42,6 +42,16 @@ Python (cuopt)
 
    Python Overview <cuopt-python/index.rst>
 
+====================================
+gRPC remote execution
+====================================
+.. toctree::
+   :maxdepth: 2
+   :caption: gRPC remote execution
+   :name: gRPC remote execution
+
+   gRPC overview <cuopt-grpc/index.rst>
+
 ===============================
 Server (cuopt-server)
 ===============================
diff --git a/docs/cuopt/source/install.rst b/docs/cuopt/source/install.rst
index 0b16bf606c..404d7361f8 100644
--- a/docs/cuopt/source/install.rst
+++ b/docs/cuopt/source/install.rst
@@ -16,6 +16,7 @@ If the selector does not load or you prefer step-by-step guides, use the quick-s
 
 * **Python (cuopt)** — :doc:`cuopt-python/quick-start`
 * **C (libcuopt)** — :doc:`cuopt-c/quick-start` (includes ``cuopt_cli``)
+* **gRPC remote execution** — :doc:`cuopt-grpc/quick-start` (install, remote execution, Docker, minimal example) and :doc:`cuopt-grpc/advanced` (TLS and tuning; not the HTTP server)
 * **Server (cuopt-server)** — :doc:`cuopt-server/quick-start`
 * **CLI (cuopt_cli)** — Install via the C API; see :doc:`cuopt-cli/quick-start`
 
diff --git a/docs/cuopt/source/introduction.rst b/docs/cuopt/source/introduction.rst
index 2d39a26913..bea1a35159 100644
--- a/docs/cuopt/source/introduction.rst
+++ b/docs/cuopt/source/introduction.rst
@@ -119,6 +119,8 @@ cuOpt supports the following APIs:
 - Python support
    - :doc:`Routing (TSP, VRP, and PDP) - Python <cuopt-python/quick-start>`
    - :doc:`Linear Programming (LP) / Quadratic Programming (QP) and Mixed Integer Linear Programming (MILP) - Python <cuopt-python/quick-start>`
+- gRPC remote execution
+   - :doc:`Linear Programming (LP) / Quadratic Programming (QP) and Mixed Integer Linear Programming (MILP) - gRPC remote <cuopt-grpc/quick-start>`
 - Server support
    - :doc:`Linear Programming (LP) - Server <cuopt-server/quick-start>`
    - :doc:`Mixed Integer Linear Programming (MILP) - Server <cuopt-server/quick-start>`
diff --git a/python/cuopt/cuopt/linear_programming/data_model/data_model.py b/python/cuopt/cuopt/linear_programming/data_model/data_model.py
index 39da5d6c47..fc55170974 100644
--- a/python/cuopt/cuopt/linear_programming/data_model/data_model.py
+++ b/python/cuopt/cuopt/linear_programming/data_model/data_model.py
@@ -127,7 +127,7 @@ class DataModel(data_model_wrapper.DataModel):
     >>>
     >>> # Method 1: directly set bounds
     >>> # Set lower bounds to -infinity and upper bounds to b
-    >>> constraint_lower_bounds = np.array([np.NINF, np.NINF],
+    >>> constraint_lower_bounds = np.array([-np.inf, -np.inf],
     >>>                                       dtype=np.float64)
     >>> constraint_upper_bounds = np.array(b, dtype=np.float64)
     >>> data_model.set_constraint_lower_bounds(constraint_lower_bounds)
@@ -136,7 +136,7 @@ class DataModel(data_model_wrapper.DataModel):
     >>>
     >>> # Set variable lower and upper bounds
     >>> variable_lower_bounds = np.array([0.0, 0.0], dtype=np.float64)
-    >>> variable_upper_bounds = np.array([2.0, np.PINF], dtype=np.float64)
+    >>> variable_upper_bounds = np.array([2.0, np.inf], dtype=np.float64)
     >>> data_model.set_variable_lower_bounds(variable_lower_bounds)
     >>> data_model.set_variable_upper_bounds(variable_upper_bounds)
     """