cua

A computer-use CLI for agents (and TUI for humans) built on pi-agent.

cua "go to news.ycombinator.com and tell me the top 3 story titles"

cua provisions a Kernel cloud browser, turns the model's computer-use tool calls into real mouse/keyboard/scroll/screenshot actions, and streams the result back to your terminal.

Why this exists

Every frontier model now ships its own first-party "computer use" tool:

OpenAI gpt-5.5: a built-in computer tool that emits actions like {type:"click", x, y}, {type:"scroll", x, y, scroll_x, scroll_y}, {type:"keypress", keys:[...]}, …
Anthropic claude-opus-4-7: a built-in computer_20251124 tool that emits {action:"left_click", coordinate:[x, y]}, {action:"scroll", scroll_direction, scroll_amount}, …
Google gemini-2.5-pro / gemini-3.x: a set of predefined computer-use functions (click_at, type_text_at, scroll_at, navigate, go_back, …) with 0-1000 normalized coordinates.
Yutori Navigator n1 / n1.5: OpenAI-compatible chat.completions responses with built-in browser action tool_calls like left_click, goto_url, type, and scroll in 0-1000 normalized coordinates.

All of them expect you to:

Run a real browser somewhere (locally is annoying, on a server is hard).
Translate every action into an actual SDK call against that browser.
Capture a fresh screenshot after each action and feed it back to the model so it can verify what happened and plan the next step.
Keep doing this in a loop until the task is done.

cua does all of this for you. The repo is structured as several focused npm packages so the per-provider plumbing is also reusable outside of this binary (e.g. by agents of your own spun up via kernel/cli templates).

Workspace

packages/
├── ai/      # @onkernel/cua-ai    - CUA model catalog + tool schemas + provider adapters (on npm)
├── agent/   # @onkernel/cua-agent - CuaAgent/CuaAgentHarness Kernel-browser execution loop (on npm)
└── cli/     # @onkernel/cua-cli   - the `cua` binary; built on cua-agent + cua-ai

Building your own agent? Start here: packages/agent (@onkernel/cua-agent) — CuaAgent/CuaAgentHarness run the full computer-use loop against a Kernel browser. It sits on packages/ai (@onkernel/cua-ai), the model layer with the curated computer-use model catalog, canonical tool schemas, and per-provider adapters on top of pi-ai; reach for cua-ai directly only when you bring your own execution. Both are published to npm.

flowchart LR
  ai[("@onkernel/cua-ai")]
  agent[("@onkernel/cua-agent")]
  cli[("@onkernel/cua-cli")]
  pi[("pi-agent-core / pi-ai / pi-tui / pi-coding-agent")]
  sdk[("@onkernel/sdk")]
  ai --> agent
  agent --> cli
  ai --> cli
  pi --> agent
  pi --> cli
  sdk --> agent
  sdk --> cli

Package	What it ships
`@onkernel/cua-ai`	Computer-use model catalog (`getCuaModel`/`listCuaModels`), canonical CUA tool schemas, and provider adapters/runtime specs built on pi-ai. On npm.
`@onkernel/cua-agent`	`CuaAgent`/`CuaAgentHarness` classes that execute cua-ai tool calls against a Kernel browser, screenshot loop included. On npm.
`@onkernel/cua-cli`	The `cua` binary: argv parsing, sessions, skills, JSONL output, pi-tui front-end.

Quickstart

git clone https://github.com/kernel/cua
cd cua
npm install

# run the CLI directly from source (no global install required):
npx tsx packages/cli/src/cli.ts --help

# if you want `cua` on $PATH from any directory, add a shell function to
# your rc that pins the repo location while preserving the caller's cwd
# (so `--out`, transcript bucketing, and `.agents/skills` discovery use
# the directory you invoked from), e.g. in ~/.bashrc:
#   CUA_REPO=/absolute/path/to/cua
#   cua() { "$CUA_REPO/node_modules/.bin/tsx" "$CUA_REPO/packages/cli/src/cli.ts" "$@"; }

# set API keys via env vars
export OPENAI_API_KEY=sk-...                 # for gpt-5.5
export ANTHROPIC_API_KEY=sk-ant-...          # for claude-opus-4-7
export GOOGLE_API_KEY=...                    # for gemini-3-flash-preview
export YUTORI_API_KEY=yt_...                 # for n1.5-latest
export KERNEL_API_KEY=sk_...                 # always required

# single-shot
cua -p "Open https://news.ycombinator.com and tell me the top story"

# list supported model ids
cua models

# Claude
cua -p --model claude-opus-4-7 "Same prompt"

# Gemini 3 Flash (built-in computer use)
cua -p --model gemini-3-flash-preview "Same prompt"

# Yutori Navigator
cua -p --model n1.5-latest "Same prompt"

# interactive TUI (default mode)
cua
cua "summarize https://news.ycombinator.com"

# agent-friendly subcommands (one-shot, see "Agent-friendly subcommands" below
# for the full surface and named-session workflow)
cua open https://github.com/login
cua click "Sign in"
cua url
cua screenshot --out shot.png

# resume the most recent session for this cwd (fresh browser, prior context)
cua -c "now click on the second result"

# JSONL events for scripting
cua -p -o jsonl "open example.com and tell me the heading"

How it works

Model layer — @onkernel/cua-ai owns the curated computer-use model catalog (getCuaModel/listCuaModels), the canonical CUA tool-call schemas, and per-provider adapters on top of pi-ai so every model emits the same CuaAction vocabulary.
Execution layer — @onkernel/cua-agent wraps pi-agent-core's Agent/AgentHarness. CuaAgentHarness runs the prompt/screenshot/tool loop against a Kernel browser: it dispatches canonical CUA tool calls into Kernel SDK browsers.computer.* calls and captures a fresh screenshot back to the model on every turn.
CLI — @onkernel/cua-cli assembles a CuaAgentHarness from command-line flags, env-var-based API keys, a JsonlSessionRepo for transcripts, and pi skills; renders the result either as plain text (--print), JSONL events (-o jsonl), or an interactive pi-tui front-end.
Browser — a fresh Kernel cloud browser session per run (or per resume) with optional named profile load/save. Every screenshot the model sees is a real PNG of a real browser tab.

See docs/architecture.md for the full end-to-end flow.

CLI reference

See packages/cli/README.md for the full CLI reference, env-var configuration, and model selection.

Highlights:

-p/--print for single-shot mode; -o jsonl for structured output.
cua models to list supported -m/--model values and their providers.
-m/--model <id> to choose one of those supported models.
-s/--session-name <name> to reuse a cua session start-allocated Kernel browser across calls.
-c/--continue, -r/--resume, --session <ref> for transcript resume.
--skill <path> / /skill:<name> for Agent Skills (defaults: ~/.agents/skills/, <cwd>/.agents/skills/).
--image-protocol / CUA_IMAGE_PROTOCOL to force inline screenshot rendering (kitty / iterm2 / none / auto; Ghostty / WezTerm are auto-detected as kitty).

Agent-friendly subcommands

Each subcommand below is one-shot: it provisions a Kernel browser, runs the action, prints a compact result on stdout, and exits with a deterministic code. Designed for shell agents to chain.

Subcommand	Result on stdout	Exit codes
`cua open <url>`	`ok`	0 ok, 2 error
`cua click "<description>"`	`ok clicked (x, y)` or `not_found <reason>`	0, 1 not_found, 2 error
`cua type "<target>" "<text>"`	`ok typed` or `not_found <reason>`	0, 1 not_found, 2 error
`cua press <key> [<key>...]`	`ok pressed`	0 ok, 2 error
`cua observe ["<question>"]`	the description / answer	0 ok, 2 error
`cua url`	the current URL	0 ok, 2 error
`cua screenshot --out <file\|->`	the path (or `(stdout)` when `--out -`)	0 ok, 2 error
`cua do "<instruction>"`	the assistant's final text	0 ok, 2 error

By default each call provisions a fresh browser, so the second call can't see anything the first call did. For multi-step workflows, use a named session.

Named sessions

cua session start login                          # provisions a Kernel browser, prints `name=login`
cua -s login open https://github.com/login
cua -s login type "email field"    "$EMAIL"
cua -s login type "password field" "$PASSWORD"
cua -s login click "Sign in"
cua -s login url                                 # stdout: the post-login URL
cua session stop login                           # tears down the Kernel browser

Inspect:

cua session list           # tab-formatted: NAME, KERNEL_ID, AGE, LIVE_URL
cua session show login     # full JSON: kernel_session_id, live_url, transcript_path, ...

-s <name> works for ALL modes (action subcommands, --print, the interactive TUI). Liveness is checked before each attach: if the Kernel browser timed out, the call fails with a clear "session no longer alive" error suggesting cua session stop <name> && cua session start <name>.

Named-session metadata lives in $XDG_DATA_HOME/cua/named-sessions/<name>.json (default ~/.local/share/cua/named-sessions/).

Session transcripts

Every --print, interactive TUI, and -s <name> invocation persists a JSONL transcript by default — useful for analyzing or self-improving agent behavior.

Where: $XDG_DATA_HOME/cua/sessions/<cwd-hash>/<id>.jsonl (default ~/.local/share/cua/sessions/). For named sessions, the exact path is in the transcript_path field of cua session show <name>.

Format: one JSON object per line. Roles: user, assistant, toolResult (from pi-coding-agent's SessionManager). There's also a custom cua-browser entry written once per session with kernel_session_id / live_url / profile_id.

Opting out: --no-session keeps the run in-memory only. One-shot action subcommands (without -s) also skip the transcript, since they're already self-contained.

Analyzing: anything that reads JSONL works. A few jq starters:

TRANSCRIPT=~/.local/share/cua/sessions/<cwd>/<id>.jsonl

# Every tool call the agent made, in order
jq -c 'select(.role == "assistant") | .content[]?
       | select(.type == "tool_use") | {name, input}' \
   "$TRANSCRIPT"

# Largest tool-result screenshot (handy when chasing context-window blowups)
jq -c 'select(.role == "toolResult") | .content[]?
       | select(.type == "image") | {len: (.data | length)}' \
   "$TRANSCRIPT" | sort -t: -k2 -n | tail -1

# Final assistant text (the answer)
jq -r 'select(.role == "assistant") | .content[]?
       | select(.type == "text") | .text' "$TRANSCRIPT" | tail -1

--print -o jsonl is a separate live-event stream (one event per line on stdout, different schema). Both are useful for analysis but they're NOT the same thing: the -o jsonl stream describes turns / tool calls / deltas as they happen; the transcript JSONL is the persisted message history pi-coding-agent's SessionManager writes.

Skills

cua follows the cross-agent ~/.agents/skills/ emerging standard. Skills loaded from any of these locations are auto-discovered (first wins on name collision):

Explicit --skill <path> flags (file or directory; repeatable).
~/.agents/skills/ (user-global).
<cwd>/.agents/skills/ (project-local).

Each skill's name, description, and file location are added to the system prompt. The model is instructed to use the read tool to load a skill's full body when its description matches the task — only descriptions and locations live in the prompt by default, so the prompt stays small no matter how many skills you have.

To force-load a skill body inline on a single turn, prefix the prompt with /skill:<name> (works in both --print and the interactive TUI):

cua -p "/skill:my-workflow open https://..."

Disable discovery entirely with --no-skills / -ns.

This repo ships a skills/cua-cli/SKILL.md aimed at OTHER agents (Claude Code, Cursor, pi-coding-agent, etc.) that want to drive cua as a CLI subcommand. To install it globally:

mkdir -p ~/.agents/skills
ln -s "$(pwd)/skills/cua-cli" ~/.agents/skills/cua-cli

Project layout

skills/
└── cua-cli/SKILL.md # skill aimed at OTHER agents driving cua via shell
packages/
├── ai/              # @onkernel/cua-ai — model layer (see packages/ai/README.md)
├── agent/           # @onkernel/cua-agent — Kernel-browser execution layer (see packages/agent/README.md)
└── cli/             # @onkernel/cua-cli — the `cua` binary (see packages/cli/README.md)

Roadmap

Auto-respawn dead Kernel sessions when -s <name> is used (today we refuse with a clear error and ask the user to re-session start).
--local Docker-backed browser as an alternative to Kernel cloud.
Anthropic hold_key / zoom action support.
pi-tui SelectList-based picker for -r instead of plain readline.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.agents/skills		.agents/skills
.github/workflows		.github/workflows
docs		docs
packages		packages
skills/cua-cli		skills/cua-cli
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cua

Why this exists

Workspace

Quickstart

How it works

CLI reference

Agent-friendly subcommands

Named sessions

Session transcripts

Skills

Project layout

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cua

Why this exists

Workspace

Quickstart

How it works

CLI reference

Agent-friendly subcommands

Named sessions

Session transcripts

Skills

Project layout

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages