Skip to content

fix(cluster): resolve DNS failures on systemd-resolved hosts#516

Open
brianwtaylor wants to merge 3 commits intoNVIDIA:mainfrom
brianwtaylor:fix/cluster-dns-systemd-resolved
Open

fix(cluster): resolve DNS failures on systemd-resolved hosts#516
brianwtaylor wants to merge 3 commits intoNVIDIA:mainfrom
brianwtaylor:fix/cluster-dns-systemd-resolved

Conversation

@brianwtaylor
Copy link

@brianwtaylor brianwtaylor commented Mar 21, 2026

Supersedes #478

@drew — reworked per your review: the bootstrap crate now sniffs resolvers and passes them as an UPSTREAM_DNS env var. No system files are mounted into the container.

Summary

  • Sniff upstream DNS resolvers from the Rust bootstrap crate by reading /run/systemd/resolve/resolv.conf (systemd-resolved hosts only)
  • Filter loopback addresses (127.x.x.x, ::1) and pass result to container as UPSTREAM_DNS env var
  • Skip DNS sniffing for remote deploys where local resolvers would be wrong
  • Entrypoint reads UPSTREAM_DNS first, falls back to /etc/resolv.conf for manual launches
  • Add DNS verification logging on failure

Closes #437

Changes

crates/openshell-bootstrap/src/docker.rs — Add resolve_upstream_dns() that reads /run/systemd/resolve/resolv.conf, filters loopback addresses, and returns real upstream resolvers. Pass them as UPSTREAM_DNS env var to the cluster container (skipped for remote deploys). Includes unit tests.

deploy/docker/cluster-entrypoint.sh — Add get_upstream_resolvers() that reads UPSTREAM_DNS env var (priority) or falls back to /etc/resolv.conf. When upstream resolvers are found, write them directly to the k3s resolv.conf instead of relying on DNAT proxy. Improve DNS verification logging on failure.

Root Cause

Docker's embedded DNS at 127.0.0.11 is only reachable from the container's own network namespace. The existing DNAT rules forward to this loopback address, but k3s pods run in child network namespaces where the forwarded packets are dropped as martian packets. On systemd-resolved hosts, /etc/resolv.conf contains 127.0.0.53 (another loopback), so the fallback also fails silently.

DNS Flow — Before vs After

BEFORE (broken on systemd-resolved hosts):

  Pod → CoreDNS → resolv.conf → iptables DNAT → Docker DNS
        (cache)   127.0.0.11     PREROUTING     127.0.0.11
                        │                            │
                        └──── FAILS ─────────────────┘
                        loopback DNAT from pod namespace
                        dropped as martian packet

AFTER (this PR):

  Pod → CoreDNS → resolv.conf → upstream resolver → response
        (cache)   e.g. 192.168.1.1    (direct UDP)
                   ▲
                   │
              set by Rust bootstrap:
              resolve_upstream_dns()
              reads /run/systemd/resolve/resolv.conf
              passes via UPSTREAM_DNS env var
              entrypoint writes to k3s resolv.conf

NON-SYSTEMD HOSTS (macOS, WSL2, Alpine) — unchanged:

  Pod → CoreDNS → resolv.conf → iptables DNAT → Docker DNS → host
        (cache)   container IP   PREROUTING     127.0.0.11

  /run/systemd/resolve/resolv.conf absent → UPSTREAM_DNS not set →
  entrypoint falls back to existing DNAT proxy path. Zero behavior change.

Testing

Software Versions

Component Version
OpenShell (release) v0.0.13
OpenShell (PR branch) 0.0.13-dev.11+gebc1369
Cluster image (PR branch) openshell/cluster:dev (built from ebc1369)
NemoClaw 0.1.0

Results

  • Tested on DGX Spark (Ubuntu 24.04, systemd-resolved, Docker with
    cgroupns=host)
  • Verified DNS resolution works from k3s pods after the fix
  • Verified no behavior change on macOS (Apple Silicon) and Windows/WSL2 hosts

===VALIDATION TOPOLOGY===

                ┌─────────────────────────┐
                │     Node A (Linux)      │
                │    aarch64 · GPU        │
                │    OpenShell 0.0.13     │
                │                         │
                │  BASELINE + ORCHESTRATOR │
                │  (read-only, runs all   │
                │   tests from here)      │
                └─────┬──────────┬────────┘
                      │          │
        high-speed    │          │ LAN
        interconnect  │          │
                      │          ├──────────────┐
            ┌─────────▼──┐   ┌───▼──────────┐  ┌▼──────────────┐
            │  Node B    │   │  Node C      │  │  Node D       │
            │  Linux     │   │  macOS       │  │  Windows/WSL2 │
            │  aarch64   │   │  Apple Si    │  │  x86_64       │
            │  OS 0.0.13 │   │  OS 0.0.13   │  │  OS 0.0.13   │
            │  -dev.11   │   │  no systemd  │  │  no systemd   │
            │            │   │              │  │               │
            │ TEST TARGET │   │              │  │               │
            │ (DNS-fixed │   │  CONTROL     │  │  CONTROL      │
            │  gateway   │   │  ✓ verified  │  │  ✓ verified   │
            │  deployed) │   │              │  │               │
            └────────────┘   └──────────────┘  └───────────────┘

===WHAT EACH NODE PROVED DURING VALIDATION===

Node A ─── Baseline capture. systemd-resolved active, upstream 192.168.4.1,
stub at 127.0.0.53. All existing containers healthy. Zero drift.

Node B ─── Fix works. Custom binary + image from PR branch deployed.
UPSTREAM_DNS=192.168.4.1 set, written to k3s resolv.conf.
Pod DNS resolution verified. All pods healthy.

Node C ─── No change on macOS. No UPSTREAM_DNS set. DNAT proxy path intact.
Pod DNS resolution verified.

Node D ─── No change on WSL2. No UPSTREAM_DNS set. DNAT proxy path intact.
Pod DNS resolution verified.

Automated Tests

cargo test -p openshell-bootstrap

Checklist

Docker's embedded DNS at 127.0.0.11 is only reachable from the
container's own network namespace. k3s pods in child namespaces
cannot reach it, causing silent DNS failures on Ubuntu and other
systemd-resolved hosts where /etc/resolv.conf contains 127.0.0.53.

Sniff upstream DNS resolvers from the host in the Rust bootstrap
crate by reading /run/systemd/resolve/resolv.conf (systemd-resolved
only — intentionally does NOT read /etc/resolv.conf to avoid
bypassing Docker Desktop's DNAT proxy on macOS/Windows). Filter
loopback addresses (127.x.x.x and ::1) and pass the result to
the container as the UPSTREAM_DNS env var. Skip DNS sniffing for
remote deploys where the local host's resolvers would be wrong.

The entrypoint checks UPSTREAM_DNS first, falling back to
/etc/resolv.conf inside the container for manual launches. This
follows the existing pattern used by registry config, SSH gateway,
GPU support, and image tags.

Closes NVIDIA#437

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
@brianwtaylor brianwtaylor requested a review from a team as a code owner March 21, 2026 00:35
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not necessary now, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. no need for these tests with the logic living properly in rust. apologies for the duplicate PR. lost the first while while deploying these changes in my test environment yesterday.

@drew drew self-assigned this Mar 21, 2026
Drop deploy/docker/tests/test-dns-resolvers.sh — the resolver
logic now lives in the Rust bootstrap crate with cargo test
coverage, making the standalone shell harness redundant.

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Move the resolv.conf parsing logic out of resolve_upstream_dns() into
its own parse_resolv_conf() function. The 10 deterministic tests now
exercise the production code path instead of a reimplemented helper.

Signed-off-by: Brian Taylor <brian.taylor818@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DNS proxy in cluster-entrypoint.sh fails silently on Linux with systemd-resolved

2 participants