Skip to content

implement zone log gathering in edera-debug-report#8

Merged
tycho merged 6 commits into
mainfrom
steven/gather-zone-logs
May 6, 2026
Merged

implement zone log gathering in edera-debug-report#8
tycho merged 6 commits into
mainfrom
steven/gather-zone-logs

Conversation

@tycho
Copy link
Copy Markdown
Member

@tycho tycho commented May 6, 2026

This does a few things:

  • Adds a list of Edera zones to the debug report
  • Adds the console output of each running zone to the debug report
  • Adds an optional command execution progress mechanism (env var EDR_DEBUG=1) to help understand slow/broken report generation
  • Guards various commands with timeout mechanisms, to ensure it doesn't stall
  • Limits the number of lines for individual service logs, to prevent over-bloat and long execution times

tycho added 6 commits May 6, 2026 11:50
Signed-off-by: Steven Noonan <steven@edera.dev>
Adds a watchdog-based timeout mechanism (TERM-then-KILL via os.killpg)
that runs each protected child in its own session, plus parent-side
SIGINT/SIGTERM/SIGHUP forwarding so children can't be orphaned when the
report tool is interrupted. Why this and not the timeout(1) coreutil:
the report runs in possibly-broken environments where tools like
protect-ctl may spawn helpers, and we need to kill the whole group
atomically without depending on an external binary.

Surfaces two entry points: run_with_timeout() for captured-output use,
and timeout_s/kill_after_s on ZipArchiveWriter.add_stream_from_proc()
for streaming into the ZIP. add_stream_from_proc now returns
(rc, stderr_text, timed_out); the two existing callers are updated.

Signed-off-by: Steven Noonan <steven@edera.dev>
Mirrors the standalone gather-zone-logs.sh: enumerate zones via
`protect-ctl zone list -o json`, then stream each zone's logs from
`protect-ctl zone logs <id>` into the archive. Both calls are bounded
by the in-process timeout helper (10s SIGTERM, +1s grace SIGKILL),
which is the failure mode this report is meant to capture in the first
place — protect-daemon being wedged.

Output lands under `<top>/protect/` so it's clearly distinct from
general system info, and the collector runs immediately before the
journalctl block so the journal capture includes any activity these
calls trigger (avoiding a pre/post journal gather).

Signed-off-by: Steven Noonan <steven@edera.dev>
Wires the in-process timeout helper into the commands most likely to
hang in the failure modes this report is designed to capture.

Signed-off-by: Steven Noonan <steven@edera.dev>
When 'EDR_DEBUG=1' in the system environment variables, add
a one-line "[edr] running: <cmd>" stderr message at the start of every
external command, plus "[edr] timeout (...): SIG... -> <cmd>" when the
watchdog fires. When the tool stalls, the last line printed identifies
the offending command.

Hooked into both add_stream_from_proc and run_with_timeout so callers
that go through run_and_write, run_json_then_fallback_text, or invoke
either helper directly all get coverage. Also prints "Creating: <path>"
to stdout up front so the output filename is visible before the
progress noise (the existing "Wrote: <path>" line is unchanged).

Signed-off-by: Steven Noonan <steven@edera.dev>
On my system, kubelet was very spammy with log messages. It was taking
a very long time to capture that log as a result.

Signed-off-by: Steven Noonan <steven@edera.dev>
@tycho tycho force-pushed the steven/gather-zone-logs branch from 9ea704c to 23ad2c4 Compare May 6, 2026 19:13
@tycho tycho requested review from azenla and bleggett May 6, 2026 19:15
@tycho tycho merged commit e2b7f88 into main May 6, 2026
2 checks passed
@tycho tycho deleted the steven/gather-zone-logs branch May 6, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants