agent-device is a CLI for UI automation on iOS, tvOS, macOS, Android, and AndroidTV. It is designed for agent-driven workflows: inspect the UI, act on it deterministically, and keep that work session-aware and replayable.
If you know Vercel's agent-browser, this project applies the same broad idea to mobile apps and devices.
- Give agents a practical way to understand mobile UI state through structured snapshots.
- Keep automation flows token-efficient enough for real agent loops.
- Make common interactions reliable enough for repeated automation runs.
- Keep automation grounded in sessions, selectors, and replayable flows instead of one-off scripts.
- Sessions: open a target once, interact within that session, then close it cleanly.
- Snapshots: inspect the current accessibility tree in a compact form and get current-screen refs for exploration.
- Refs vs selectors: use refs for discovery, use selectors for durable replay and assertions.
- Tests: run deterministic
.adscripts as a light e2e test suite. - Replay scripts: save
.adflows with--save-script, replay one script withreplay, or run a folder/glob as a serial suite withtest.testsupports metadata-aware retries up to 3 additional attempts, per-test timeouts, flaky pass reporting, and runner-managed artifacts under.agent-device/test-artifactsby default. Each attempt writesreplay.adandresult.txt; failed attempts also keep copied logs and artifacts when available. - Human docs vs agent skills: docs explain the system for people; skills provide compact operating guidance for agents.
The canonical loop is:
agent-device apps --platform ios
agent-device open SampleApp --platform ios
agent-device snapshot -i
agent-device press @e3
agent-device diff snapshot -i
agent-device fill @e5 "test"
agent-device press @e5
agent-device type " more" --delay-ms 80
agent-device closeIn practice, most work follows the same pattern:
- Discover the exact app id with
appsif the package or bundle name is uncertain. opena target app or URL.snapshot -ito inspect the current screen.press,fill,scroll,get, orwaitusing refs or selectors. On iOS and Android, default snapshot text follows the same visible-first contract: refs shown in default output are actionable now, while hidden content is surfaced as scroll/list discovery hints instead of tappable off-screen refs. Userotate <orientation>when a flow needs a deterministic portrait or landscape state on mobile targets.diff snapshotor re-snapshot after UI changes.closewhen the session is finished.
In non-JSON mode, core mutating commands print a short success acknowledgment so agents and humans can distinguish successful actions from dropped or silent no-ops.
For people:
For agents:
npm install -g agent-deviceagent-device now performs a lightweight background upgrade check for interactive CLI runs and, when a newer package is available, suggests a global reinstall command. Updating the package also refreshes the bundled skills/ shipped with the CLI.
Set AGENT_DEVICE_NO_UPDATE_NOTIFIER=1 to disable the notice.
On macOS, agent-device includes a local agent-device-macos-helper source package that is built on demand for desktop permission checks, alert handling, and helper-backed desktop snapshot surfaces. Release distribution should use a signed/notarized helper build; source checkouts fall back to a local Swift build.
See CONTRIBUTING.md.
agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.
