Skip to content

tracing subscriber not installed before config loading, early warnings silently dropped #2893

@bug-ops

Description

@bug-ops

Description

The tracing subscriber is installed in runner::run() only after config loading completes. Several code paths in the config loading pipeline emit tracing::warn! and tracing::debug! events that are silently dropped because no subscriber exists yet. This creates an observability blind spot during the earliest phase of startup.

The architecture is fundamentally sound — config must be loaded before tracing can be fully configured (chicken-and-egg). However, diagnostic warnings about invalid environment variables and deprecated config values are lost, which can confuse users who misconfigure env vars and see no feedback.

Reproduction Steps

  1. Set an invalid environment variable: ZEPH_LLM_PROVIDER=nonexistent
  2. Run RUST_LOG=debug cargo run --features full
  3. Observe: no warning about the invalid provider value appears in output

Or:

  1. Set a deprecated env var: ZEPH_STT_MODEL=whisper-1
  2. Run cargo run --features full
  3. Observe: no deprecation warning appears, the variable is silently ignored

Expected Behavior

All tracing::warn! events emitted during config loading should be visible to the user, either via a temporary stderr subscriber or by using eprintln! for pre-subscriber diagnostics.

Actual Behavior

The following tracing events are silently dropped (emitted before subscriber init):

Source Level Message
zeph-core/src/bootstrap/config.rs:24,29,36,47 debug "config resolved via {source}: {path}"
zeph-config/src/env.rs:34 warn "ignoring invalid ZEPH_LLM_PROVIDER value: {v}"
zeph-config/src/env.rs:233 warn "ZEPH_STT_MODEL is no longer supported"
zeph-config/src/env.rs:239 warn "ZEPH_STT_BASE_URL is no longer supported"
zeph-config/src/memory.rs:1107 warn "task_aware_mig has been removed; falling back"
zeph-config/src/providers.rs:1257-1323 warn Various provider validation warnings

All heavy subsystem initialization (vault, LLM providers, memory, MCP, channels) correctly happens after tracing init — only the config bootstrap is affected.

Implementation Plan

Two approaches, pick one:

Option A: Two-phase subscriber (comprehensive)

  1. Before resolve_config_path, install a minimal stderr-only subscriber using tracing_subscriber::reload::Layer
  2. Load config as currently done (all early events now captured to stderr)
  3. In init_tracing(), use the reload handle to replace the temporary layer with the full layer stack (file logging, chrome traces, OTLP)

Option B: Replace tracing with eprintln in pre-subscriber paths (simpler)

  1. In resolve_config_path: replace tracing::debug! with no-op or conditional stderr output
  2. In apply_env_overrides (env.rs): replace tracing::warn! with eprintln!("zeph: ...")
  3. In PruningStrategy deserializer (memory.rs): replace tracing::warn! with eprintln!
  4. In provider validation (providers.rs): replace tracing::warn! with eprintln!

This is consistent with tracing_init.rs itself, which already uses eprintln! for its own error conditions (lines 134, 155, 260).

Recommendation

Option B is simpler, has zero architectural overhead, and follows the existing pattern in tracing_init.rs. Option A is more correct but adds reload layer complexity for events that only fire on misconfiguration.

Environment

  • Version: HEAD (a7268a5)
  • All features

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Research — medium-high complexitybugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions