Parent: #405
Problem
Reviewer agents that hit API overload or connection issues silently return with 0 tool uses after several minutes, consuming wall time and producing nothing. The parent workflow then has to manually mark the review as passed or retry, adding another round-trip.
Evidence
From session logs (sansari-terminal-may12.txt):
⏺ deepwork:reviewer(Review finder_analysis step output in parent workflow)
⎿ Done (0 tool uses · 0 tokens · 3m 46s)
⏺ deepwork:reviewer(Review finder_analysis step output in parent workflow)
⎿ Done (0 tool uses · 0 tokens · 3m 56s)
Both calls to the same reviewer returned 0 tool uses after ~4 minutes each — a strong signal of API timeout or rate limiting. Combined waste: ~8 minutes for a step that was eventually bypassed anyway.
Proposed fix
Two changes:
1. Hard timeout on reviewer agents. If a reviewer agent completes with 0 tool uses and non-zero elapsed time (indicating it started but did nothing), treat it as a failure rather than a pass. Log a warning and retry once with exponential backoff.
2. Max retry cap with skip-and-log. After one retry, if the reviewer still returns 0 tool uses, skip the review for this step and log: "reviewer skipped after 2 failed attempts — manual review recommended". Do not block the workflow indefinitely on API issues.
3. Fast-fail on 0 tool uses. If the reviewer returns 0 tool uses within the first 30 seconds, it likely never got a response at all. Retry immediately rather than waiting the full elapsed time.
Quality preservation
- Reviewer still runs on first attempt as today.
- One retry is attempted before skipping.
- Skips are logged and surfaced to the user so they can manually review if the step is high-stakes.
- The skip does not mark the step as "passed" — it marks it as "reviewer unavailable, unverified."
Parent: #405
Problem
Reviewer agents that hit API overload or connection issues silently return with 0 tool uses after several minutes, consuming wall time and producing nothing. The parent workflow then has to manually mark the review as passed or retry, adding another round-trip.
Evidence
From session logs (
sansari-terminal-may12.txt):Both calls to the same reviewer returned 0 tool uses after ~4 minutes each — a strong signal of API timeout or rate limiting. Combined waste: ~8 minutes for a step that was eventually bypassed anyway.
Proposed fix
Two changes:
1. Hard timeout on reviewer agents. If a reviewer agent completes with 0 tool uses and non-zero elapsed time (indicating it started but did nothing), treat it as a failure rather than a pass. Log a warning and retry once with exponential backoff.
2. Max retry cap with skip-and-log. After one retry, if the reviewer still returns 0 tool uses, skip the review for this step and log:
"reviewer skipped after 2 failed attempts — manual review recommended". Do not block the workflow indefinitely on API issues.3. Fast-fail on 0 tool uses. If the reviewer returns 0 tool uses within the first 30 seconds, it likely never got a response at all. Retry immediately rather than waiting the full elapsed time.
Quality preservation