Skip to content

feat(agent): add explicit execution state machine#1944

Open
cal-gooo wants to merge 1 commit intostrands-agents:mainfrom
cal-gooo:feat/agent-state-machine
Open

feat(agent): add explicit execution state machine#1944
cal-gooo wants to merge 1 commit intostrands-agents:mainfrom
cal-gooo:feat/agent-state-machine

Conversation

@cal-gooo
Copy link

Summary

Closes #1921

This PR introduces a formal AgentStateMachine to make the agent's implicit execution lifecycle explicit, enabling durable agents, observability hooks, and safe checkpointing.

What's added

  • AgentStateMachine (src/strands/agent/state_machine.py) — tracks lifecycle with validated transitions and synchronous listener callbacks
  • AgentExecutionState enum — IDLE → INITIALIZING → MODEL_CALL ↔ TOOL_EXECUTION → COMPLETED/INTERRUPTED/CANCELLED/ERROR → IDLE
  • CHECKPOINT_STATES — marks IDLE, INTERRUPTED, and COMPLETED as safe states to snapshot agent data for durability
  • AgentStateTransitionEvent hook — fires on every state change so plugins can react (e.g. persist a snapshot when entering a checkpoint state)
  • Serializationstate_machine.to_dict() / AgentStateMachine.from_dict(data) for restoring state after a process restart

How it's wired

Location Transitions
agent.stream_async IDLE → INITIALIZING at start; → IDLE in finally (skips if INTERRUPTED)
agent._run_loop → COMPLETED / INTERRUPTED / CANCELLED from EventLoopStopEvent.stop_reason; COMPLETED → INITIALIZING on AfterInvocationEvent.resume
event_loop_cycle → MODEL_CALL (normal path) or → TOOL_EXECUTION (interrupt/pending tool_use path); MODEL_CALL → TOOL_EXECUTION when model returns tool_use
Exception handler try_transition(ERROR) on unhandled exceptions

Usage

from strands import Agent
from strands.agent.state_machine import AgentExecutionState, CHECKPOINT_STATES
from strands.hooks import AgentStateTransitionEvent, HookProvider, HookRegistry

# Observe state at any time
agent = Agent()
print(agent.state_machine.state)          # AgentExecutionState.IDLE
print(agent.state_machine.is_checkpoint)  # True

# Hook-based durability checkpointing
class CheckpointHook(HookProvider):
    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(AgentStateTransitionEvent, self.on_transition)

    def on_transition(self, event: AgentStateTransitionEvent) -> None:
        if event.new_state in CHECKPOINT_STATES:
            snapshot = event.agent.state_machine.to_dict()
            # persist snapshot ...

# Restore after crash
agent.state_machine = AgentStateMachine.from_dict(snapshot)

Test plan

  • All 450 agent / event_loop / hooks unit tests pass (python -m pytest tests/strands/agent/ tests/strands/event_loop/ tests/strands/hooks/)
  • State machine unit-tested: valid transitions, invalid transition rejection, serialization round-trip, listener callbacks
  • Interrupt resume path (INTERRUPTED → INITIALIZING → TOOL_EXECUTION) verified

🤖 Generated with Claude Code

Introduces AgentStateMachine and AgentExecutionState to make the
agent's implicit lifecycle phases observable and serializable, enabling
durable agents and fine-grained hook-based observability.

Key additions:
- `AgentStateMachine` with validated state transitions and listener callbacks
- `AgentExecutionState` enum: IDLE, INITIALIZING, MODEL_CALL, TOOL_EXECUTION,
  INTERRUPTED, COMPLETED, CANCELLED, ERROR
- `CHECKPOINT_STATES` marks safe points for durable agent snapshots
  (IDLE, INTERRUPTED, COMPLETED)
- `AgentStateTransitionEvent` hook fires on every state change, enabling
  durability checkpoints, metrics, and audit logging
- `agent.state_machine.to_dict()` / `from_dict()` for serialization

Wired into Agent.stream_async, _run_loop, and event_loop_cycle so
all transitions are tracked automatically.

Closes strands-agents#1921

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Agent State Machine

1 participant