feat: Intelligent job lifecycle — graceful shutdown, automatic retry, warm standby

## Description

Improve job lifecycle management beyond the current launch-and-hard-kill model. With two-way SDK communication, Mission Control can gracefully stop jobs, automatically retry failures with preserved context, and pre-warm server instances for instant job startup.

## Use Case

Today, job lifecycle is binary: launch a TUI process, hard-kill it when done or stuck. There's no way to ask a job to wrap up cleanly, retry a failure without losing context, or reduce the latency of spinning up new jobs. The serve-based SDK (#65) gives Mission Control active control over job sessions, enabling smarter lifecycle management.

## Proposed Solution

### mc_kill → Graceful Shutdown
Instead of hard-killing the tmux pane:
1. **First**: Send a prompt via SDK asking the agent to wrap up — `client.session.prompt("Stop after your current task. Summarize what you've done and any remaining work.")`
2. **Wait**: Give the agent a configurable timeout (e.g., 60s) to respond
3. **Then**: If the agent responds, capture the summary and shut down cleanly. If timeout, fall back to hard kill.
4. The agent gets to save state, report findings, and produce a handoff summary — instead of being yanked mid-thought

### Automatic Retry with Context
When a job fails (error event via SSE):
- MC sees the **structured error** (not just "pane died with exit code 1")
- Instead of tearing down the worktree and relaunching from scratch, MC can prompt the **same session** to retry: `client.session.prompt("That failed with: {error}. Try a different approach.")`
- The agent retains full conversation history and everything it learned — no cold start
- Configurable retry policy: max retries, backoff, whether to auto-retry or ask the user first
- After max retries exhausted, mark failed and surface to user with full error context

### Warm Standby Jobs
Pre-spin up `opencode serve` instances with worktrees ready:
- A pool of N idle servers running in tmux panes, each in a prepared worktree
- When a plan needs a new job, it's near-instant — the server is already running, just create a session and send a prompt
- Eliminates the latency of worktree creation, git checkout, dependency setup, and server startup
- Pool size configurable via `MCConfig` (e.g., `warmPoolSize: 3`)
- Idle servers are recycled if unused after a timeout
- Particularly valuable for plans with many sequential jobs (each job starts immediately after the previous one's merge)

**Files likely affected:** `src/tools/kill.ts`, `src/lib/monitor.ts`, `src/lib/orchestrator.ts`, new `src/lib/warm-pool.ts`, `src/lib/config.ts`

## Additional Context

**Prerequisite:** #65 (Serve-based orchestration) — graceful shutdown requires SDK prompt capability, retry requires session persistence, warm standby requires running serve instances.

These three features share the theme of treating jobs as **long-lived managed sessions** rather than **disposable processes**. The serve architecture makes this possible because the server persists independently of any single prompt or interaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Intelligent job lifecycle — graceful shutdown, automatic retry, warm standby #67

Description

Use Case

Proposed Solution

mc_kill → Graceful Shutdown

Automatic Retry with Context

Warm Standby Jobs

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Intelligent job lifecycle — graceful shutdown, automatic retry, warm standby #67

Description

Description

Use Case

Proposed Solution

mc_kill → Graceful Shutdown

Automatic Retry with Context

Warm Standby Jobs

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions