Description
Improve job lifecycle management beyond the current launch-and-hard-kill model. With two-way SDK communication, Mission Control can gracefully stop jobs, automatically retry failures with preserved context, and pre-warm server instances for instant job startup.
Use Case
Today, job lifecycle is binary: launch a TUI process, hard-kill it when done or stuck. There's no way to ask a job to wrap up cleanly, retry a failure without losing context, or reduce the latency of spinning up new jobs. The serve-based SDK (#65) gives Mission Control active control over job sessions, enabling smarter lifecycle management.
Proposed Solution
mc_kill → Graceful Shutdown
Instead of hard-killing the tmux pane:
- First: Send a prompt via SDK asking the agent to wrap up —
client.session.prompt("Stop after your current task. Summarize what you've done and any remaining work.")
- Wait: Give the agent a configurable timeout (e.g., 60s) to respond
- Then: If the agent responds, capture the summary and shut down cleanly. If timeout, fall back to hard kill.
- The agent gets to save state, report findings, and produce a handoff summary — instead of being yanked mid-thought
Automatic Retry with Context
When a job fails (error event via SSE):
- MC sees the structured error (not just "pane died with exit code 1")
- Instead of tearing down the worktree and relaunching from scratch, MC can prompt the same session to retry:
client.session.prompt("That failed with: {error}. Try a different approach.")
- The agent retains full conversation history and everything it learned — no cold start
- Configurable retry policy: max retries, backoff, whether to auto-retry or ask the user first
- After max retries exhausted, mark failed and surface to user with full error context
Warm Standby Jobs
Pre-spin up opencode serve instances with worktrees ready:
- A pool of N idle servers running in tmux panes, each in a prepared worktree
- When a plan needs a new job, it's near-instant — the server is already running, just create a session and send a prompt
- Eliminates the latency of worktree creation, git checkout, dependency setup, and server startup
- Pool size configurable via
MCConfig (e.g., warmPoolSize: 3)
- Idle servers are recycled if unused after a timeout
- Particularly valuable for plans with many sequential jobs (each job starts immediately after the previous one's merge)
Files likely affected: src/tools/kill.ts, src/lib/monitor.ts, src/lib/orchestrator.ts, new src/lib/warm-pool.ts, src/lib/config.ts
Additional Context
Prerequisite: #65 (Serve-based orchestration) — graceful shutdown requires SDK prompt capability, retry requires session persistence, warm standby requires running serve instances.
These three features share the theme of treating jobs as long-lived managed sessions rather than disposable processes. The serve architecture makes this possible because the server persists independently of any single prompt or interaction.
Description
Improve job lifecycle management beyond the current launch-and-hard-kill model. With two-way SDK communication, Mission Control can gracefully stop jobs, automatically retry failures with preserved context, and pre-warm server instances for instant job startup.
Use Case
Today, job lifecycle is binary: launch a TUI process, hard-kill it when done or stuck. There's no way to ask a job to wrap up cleanly, retry a failure without losing context, or reduce the latency of spinning up new jobs. The serve-based SDK (#65) gives Mission Control active control over job sessions, enabling smarter lifecycle management.
Proposed Solution
mc_kill → Graceful Shutdown
Instead of hard-killing the tmux pane:
client.session.prompt("Stop after your current task. Summarize what you've done and any remaining work.")Automatic Retry with Context
When a job fails (error event via SSE):
client.session.prompt("That failed with: {error}. Try a different approach.")Warm Standby Jobs
Pre-spin up
opencode serveinstances with worktrees ready:MCConfig(e.g.,warmPoolSize: 3)Files likely affected:
src/tools/kill.ts,src/lib/monitor.ts,src/lib/orchestrator.ts, newsrc/lib/warm-pool.ts,src/lib/config.tsAdditional Context
Prerequisite: #65 (Serve-based orchestration) — graceful shutdown requires SDK prompt capability, retry requires session persistence, warm standby requires running serve instances.
These three features share the theme of treating jobs as long-lived managed sessions rather than disposable processes. The serve architecture makes this possible because the server persists independently of any single prompt or interaction.