Skip to content

feat(logging): per-run log directory with unified main.jsonl (DIM-685)#1437

Open
spomichter wants to merge 1 commit intodevfrom
feat/dim-685-per-run-logs
Open

feat(logging): per-run log directory with unified main.jsonl (DIM-685)#1437
spomichter wants to merge 1 commit intodevfrom
feat/dim-685-per-run-logs

Conversation

@spomichter
Copy link
Contributor

Problem

Closes DIM-685

every forkserver worker creates its own log file — a single run produces 6+ scattered dimos_<ts>_<pid>.jsonl files. agents can't grep one file for all output, and there's no way to tell which files belong to which run.

Solution

all processes (main + workers) now write to one file per run:

logs/20260306-143022-unitree-go2/
  main.jsonl    ← everything

how it works:

  • set_run_log_dir() sets os.environ["DIMOS_RUN_LOG_DIR"] before blueprint.build()
  • forkserver context is created lazily during build — inherits the env var
  • all workers read env var in _get_log_file_path() → write to same main.jsonl
  • append mode, POSIX atomic writes (<4KB lines), no lock needed
  • FileHandler instead of RotatingFileHandler (avoids multi-process rotation races)
  • legacy fallback preserved when no run dir is set

zero changes to worker.py.

Breaking Changes

None

How to Test

cd /home/ubuntu/dimos
git fetch origin && git checkout feat/dim-685-per-run-logs
source .venv/bin/activate
dimos run unitree-go2 --simulation
# check logs/ for a new timestamped directory with main.jsonl
ls logs/

Contributor License Agreement

  • I have read and approved the CLA

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR consolidates per-process log files into a single logs/<run-id>/main.jsonl file shared by the main process and all forkserver workers, by setting DIMOS_RUN_LOG_DIR in the environment before blueprint.build() and having every process derive its log path from that variable. It also replaces RotatingFileHandler with a plain FileHandler to avoid multi-process rotation races.

Key issues found:

  • Main-process logger still writes to legacy path: logger = setup_logger() is called at module-import time in dimos.py (and in many other modules), well before set_run_log_dir is invoked. The resulting stdlib logger has a FileHandler pinned to the legacy dimos_<ts>_<pid>.jsonl path. Resetting _LOG_FILE_PATH = None inside set_run_log_dir invalidates the cache for future setup_logger calls, but does not replace the handler already attached to the main-process logger. As a result, "Starting DimOS", "Logging to", and all other early main-process log entries are written to the legacy file rather than main.jsonl, undermining the stated goal of unified logging.
  • No file-size limit: Removing RotatingFileHandler eliminates rotation races, but also removes any upper bound on main.jsonl size. Long-running or verbose sessions could fill the disk without any automatic management.
  • Incomplete directory-name sanitization: safe_name only replaces / and spaces; other shell/filesystem-unfriendly characters are left as-is.

Confidence Score: 2/5

  • Not safe to merge as-is — the main process continues logging to the legacy scattered file, directly contradicting the PR's core objective.
  • The central mechanism (env-var inheritance for workers) is sound, but the main-process logger is configured at import time before set_run_log_dir is ever called, meaning the main process's own log output never reaches main.jsonl. This is a concrete regression against the stated goal and needs to be fixed before merging.
  • dimos/robot/cli/dimos.py and dimos/utils/logging_config.py both require attention: the former needs its module-level logger re-configured after set_run_log_dir, and the latter needs a mechanism to update already-attached file handlers when the run log dir is set.

Important Files Changed

Filename Overview
dimos/utils/logging_config.py Adds per-run log directory support via set_run_log_dir and env-var inheritance for workers; switches from RotatingFileHandler to plain FileHandler. The worker-side path is correct, but the _configure_structlog cache-reset in set_run_log_dir does not retroactively update already-configured stdlib loggers in the main process.
dimos/robot/cli/dimos.py Adds run-ID generation and calls set_run_log_dir before blueprint.build(). Core logic issue: module-level logger = setup_logger() runs at import time before set_run_log_dir is ever called, so the main-process logger retains a legacy FileHandler and does not write to main.jsonl.

Sequence Diagram

sequenceDiagram
    participant M as Main Process<br/>(dimos.py import)
    participant LC as logging_config
    participant E as os.environ
    participant W as Worker Process

    M->>LC: setup_logger() [at import time]
    LC->>LC: _configure_structlog()<br/>_RUN_LOG_DIR=None, env var unset
    LC->>LC: _get_log_file_path() → legacy dimos_<ts>_<pid>.jsonl
    LC-->>M: FileHandler → legacy path (cached in stdlib logger)

    Note over M: run() is called

    M->>LC: set_run_log_dir(log_dir)
    LC->>LC: _RUN_LOG_DIR = log_dir<br/>_LOG_FILE_PATH = None  ← cache cleared
    LC->>E: DIMOS_RUN_LOG_DIR = log_dir

    M->>M: logger.info("Logging to ...")
    Note over M,LC: ⚠️ Still uses legacy FileHandler!<br/>main.jsonl is NOT written by main process

    M->>M: blueprint.build() → forkserver starts workers

    W->>E: inherit DIMOS_RUN_LOG_DIR
    W->>LC: setup_logger() [fresh process]
    LC->>LC: _RUN_LOG_DIR=None, env var SET
    LC->>LC: _get_log_file_path() → logs/<run-id>/main.jsonl
    LC-->>W: FileHandler → main.jsonl ✅

    Note over M,W: Main process → legacy file<br/>Workers → main.jsonl<br/>Goal of unified log NOT fully achieved
Loading

Comments Outside Diff (1)

  1. dimos/robot/cli/dimos.py, line 25 (link)

    Main-process logger writes to legacy path, not main.jsonl

    logger = setup_logger() is executed at module import time (before any call to set_run_log_dir). At that moment _RUN_LOG_DIR is None and DIMOS_RUN_LOG_DIR is unset, so _get_log_file_path() falls through to the legacy dimos_<ts>_<pid>.jsonl path and a FileHandler pointing to that file is attached to the stdlib logger.

    When set_run_log_dir is later called in run() (line 127), it resets _LOG_FILE_PATH = None and sets the env var, but it does not clear or replace the FileHandler that was already attached to the stdlib logger for this module. Consequently:

    • logger.info("Starting DimOS") (line 107) → written to the legacy file
    • logger.info("Logging to", ...) (line 128) → also written to the legacy file

    Workers start fresh, read DIMOS_RUN_LOG_DIR, and correctly write to main.jsonl — but the main process itself does not, which directly contradicts the PR's stated goal of "all processes (main + workers) now write to one file per run."

    A straightforward fix is to re-call setup_logger() (or re-attach the file handler) immediately after set_run_log_dir returns, so the main-process logger is updated to use main.jsonl before any further log calls.

Last reviewed commit: 8790b24

Comment on lines +123 to +125
blueprint_name = "-".join(robot_types)
safe_name = blueprint_name.replace("/", "-").replace(" ", "-")
run_id = f"{_time.strftime('%Y%m%d-%H%M%S')}-{safe_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete safe_name sanitization for directory names

The current sanitization only replaces / and space, but filesystem-unfriendly characters in robot type names (e.g., :, *, ?, \, |, <, >) would still be included in the directory name and could cause failures on Windows or certain Linux filesystems/tools.

Consider using a more comprehensive approach:

Suggested change
blueprint_name = "-".join(robot_types)
safe_name = blueprint_name.replace("/", "-").replace(" ", "-")
run_id = f"{_time.strftime('%Y%m%d-%H%M%S')}-{safe_name}"
safe_name = "".join(c if c.isalnum() or c in "-_." else "-" for c in blueprint_name)

Comment on lines +260 to +265
file_handler = logging.FileHandler(
log_file_path,
mode="a",
maxBytes=10 * 1024 * 1024, # 10MiB
backupCount=20,
encoding="utf-8",
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No file size limit after removing RotatingFileHandler

Switching to plain FileHandler is the right call for multi-process safety (rotation races with forkserver workers are a real problem), but it means main.jsonl can now grow without bound. For long-running robot sessions with verbose logging this could fill the disk.

Consider capping growth with an explicit note, or adding a maxBytes guard at startup (e.g. warn or truncate if the file already exceeds a threshold), or at least documenting the expected maintenance story (e.g. "rotate manually" or "logrotate config provided") so operators know what to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant