This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
AskUI Vision Agent is a Python desktop and mobile automation framework that enables AI agents to control computers (Windows, macOS, Linux), mobile devices (Android, iOS), and HMI systems. It supports both programmatic UI automation (RPA-like single-step commands) and agentic intent-based instructions using vision/computer vision models.
Tech Stack: Python 3.10+, Pydantic 2, Anthropic SDK, OpenTelemetry, Model Context Protocol (MCP), PDM
# Install dependencies
pdm install# Run all tests (parallel execution)
pdm run test
# Run specific test suites
pdm run test:unit # Unit tests only
pdm run test:integration # Integration tests only
pdm run test:e2e # End-to-end tests only
# Run tests with coverage
pdm run test:cov # All tests with coverage report
pdm run test:cov:view # View coverage report in browser# Quick QA: type check, format, and fix linting issues (run before commits)
pdm run qa:fix
# Individual commands
pdm run typecheck:all # Type checking with mypy
pdm run format # Format code with ruff
pdm run lint # Lint code with ruff
pdm run lint:fix # Auto-fix linting issues# Regenerate gRPC client code from .proto files
pdm run grpc:gen
# Regenerate Pydantic models from JSON schemas
pdm run json:genComputerAgent (Main SDK Entry Point)
↓
Agent (Abstract base class for all agents)
├── ComputerAgent (Desktop automation)
├── AndroidAgent (Mobile Android automation)
├── WebVisionAgent (Web-specific automation)
└── WebTestingAgent (Web testing framework)
Uses:
├── ModelRouter → Model selection/composition
├── AgentToolbox → Tool & OS abstraction
└── Locators → UI element identification
Key Flow:
- User calls
agent.click("Submit button")onComputerAgent AgentBase.locate()routes to appropriate model viaModelRouter- Model receives screenshot + locator → returns coordinates
AgentToolbox.os.click()→ gRPC call to Agent OS- Agent OS performs actual mouse click
FastAPI Chat API (Experimental)
├── Assistants (AI agent configurations)
├── Threads (Conversation sessions)
├── Messages (Chat history)
├── Runs (Agent execution iterations)
├── Files (Attachments & resources)
├── MCP Configs (Tool providers)
└── Workflows & Scheduled Jobs (Automation triggers)
Key Flow:
- User → Chat UI (hub.askui.com) → Chat API (FastAPI)
- Thread/Messages stored in SQLAlchemy database
- Runs execute agent steps in a loop
- Agent uses ModelRouter → Tools (MCP servers or direct) → AgentOS
The ModelRouter provides a flexible abstraction for AI model selection:
# Single model for all tasks
model = "askui"
# Task-specific models (ActModel, GetModel, LocateModel)
model = {
"act": "claude-sonnet-4-20250514",
"get": "askui",
"locate": "askui-combo"
}
# Custom registry
models = ModelRegistry()
models.register("my-model", custom_model_instance)Supported Model Providers:
- AskUI Models (Primary - internally hosted)
- Anthropic Claude (Computer Use, Messages API)
- Google Gemini (via OpenRouter)
- Hugging Face Spaces (Community models)
AgentOs provides an abstraction layer for OS-level operations:
AgentOs (Abstract Interface)
├── AskUiControllerClient (gRPC to AskUI Agent OS - primary)
├── PlaywrightAgentOs (Web browser automation)
└── AndroidAgentOs (Android ADB)
Locators identify UI elements in multiple ways:
- Text: Match by text content (exact/similar/contains/regex)
- Image: Match by image file or base64
- Prompt: Natural language description
- Coordinate: Absolute (x, y) position
- Relatable: Positional relationships (right_of, below, etc.)
Serialization differs by model type (VLM vs. traditional).
Tools follow the Model Context Protocol (MCP) for extensibility:
Tools (MCP Servers)
├── Computer: screenshot, click, type, mouse, clipboard
├── Android: device control via ADB
├── Testing: scenario & feature management
└── Utility: file ops, data extraction
Tools are auto-discovered and can be dynamically loaded via MCP configurations.
src/askui/agent.py- MainComputerAgentclass (user-facing API)src/askui/agent_base.py- AbstractAgent(base) with shared agent logicsrc/askui/android_agent.py- Android-specific agentsrc/askui/web_agent.py- Web-specific agent
src/askui/models/- AI model providers & router factorysrc/askui/models/shared/- Shared abstractions (Agent,Tool,MessagesApi)src/askui/models/{provider}/- Provider implementationssrc/askui/prompts/- System prompts for different models
src/askui/tools/agent_os.py- AbstractAgentOsinterfacesrc/askui/tools/askui/- gRPC client for AskUI Agent OSsrc/askui/tools/android/- Android-specific toolssrc/askui/tools/playwright/- Web automation toolssrc/askui/tools/mcp/- MCP client/server implementationssrc/askui/tools/testing/- Test scenario tools
src/askui/locators/- UI element selectorssrc/askui/locators/serializers.py- Locator serialization for models
src/askui/chat/- FastAPI-based Chat APIsrc/askui/chat/api/- REST API routessrc/askui/chat/migrations/- Alembic migrations & ORM models
src/askui/utils/- Image processing, API utilities, caching, annotationssrc/askui/reporting.py- Reporting & loggingsrc/askui/retry.py- Retry logic with exponential backoffsrc/askui/telemetry/- OpenTelemetry tracing & analytics
- Private members: Use
_prefix for all private variables, functions, methods, etc. Mark everything private that doesn't need external access. - Type hints: Required everywhere. Use built-in types (
list,dict,str | None) instead oftypingmodule types (List,Dict,Optional). - Overrides: Use
@overridedecorator fromtyping_extensionsfor all overridden methods. - Exceptions: Never pass literals to exceptions. Assign to variables first:
# Good error_msg = f"Thread {thread_id} not found" raise FileNotFoundError(error_msg) # Bad raise FileNotFoundError(f"Thread {thread_id} not found")
- File operations: Always specify
encoding="utf-8"for file read/write operations. - Init files: Create
__init__.pyin each folder.
- Use response type in function signature instead of
response_modelin route annotation. - Dependencies without defaults should come before arguments with defaults.
- Use
pytest-mockfor mocking wherever possible. - Test files in
tests/follow structure:test_*.pywithTest*classes andtest_*functions. - Timeout: 60 seconds per test (configured in
pyproject.toml).
- Never use
git add .- explicitly add files related to the task. - Use conventional commits format:
feat:,fix:,docs:,style:,refactor:,test:,chore:. - Before committing, always run:
pdm run qa:fix(or individually:typecheck:all,format,lint:fix).
- All public functions, classes, and constants require docstrings.
- Document constructor args in class docstring, omit
__init__docstring. - Use backticks for code references (variables, types, functions).
- Function references:
click(), Class references:ComputerAgent, Method references:VisionAgent.click() - Include sections:
Args,Returns,Raises,Example,Notes,See Alsoas needed. - Document parameter types in parentheses, add
, optionalfor defaults.
When writing or updating documentation in docs/:
- Never show setting environment variables in Python code (e.g.,
os.environ["ASKUI_WORKSPACE_ID"] = "..."). This is bad practice. Always instruct users to set environment variables via their shell or system settings. - Keep examples concise and focused on the feature being documented.
- Test all code examples before including them.
- Use
ComputerAgent(notVisionAgent) in examples.
AgentToolboxwrapsAgentOsimplementationsModelRoutercomposes multiple model providersCompositeReporteraggregates multiple reporters
ModelRouter.initialize_default_model_registry()creates model registry- Model providers use factory functions for lazy-loading
- Truncation strategies for message history
- Different locator serializers for model types
- Retry strategies with exponential backoff
AgentOsabstraction bridges OS implementations (gRPC, Playwright, ADB)ModelFacadeadapts models toActModel/GetModel/LocateModelinterfaces
- Constructor-based DI throughout
- FastAPI dependencies for Chat API routes
@auto_inject_agent_osdecorator for tools
Agent._step()orchestrates tool-calling loopAgentprovides common structure for all agents
- Schema versioning in
src/askui/chat/migrations/ - ORM models in
migrations/shared/{entity}/models.py - Auto-migration on startup (configurable)
- SQLAlchemy with async support
- OpenTelemetry integration (FastAPI, HTTPX, SQLAlchemy)
- Structured logging with structlog
- Correlation IDs for request tracing
- Prometheus metrics via FastAPI instrumentator
- Segment Analytics for usage tracking
- Inherit from
ActModel,GetModel, orLocateModel - Implement message creation via
MessagesApi - Register in
ModelRegistry - Use appropriate locator serializer
- Implement
Toolprotocol inmodels/shared/tools.py - Register in appropriate MCP server (
api/mcp_servers/{type}.py) - Use
@auto_inject_agent_osfor AgentOs dependency - Follow Pydantic schema validation
- Inherit from
Agent - Implement required abstract methods
- Provide appropriate
AgentOsimplementation - Register in agent factory if needed
- Screenshot caching for multi-step operations
- Token counting before API calls
- Cached trajectory execution (replay previous interactions)
- Image downsampling & compression
- Lazy model initialization (
@functools.cache)
Custom exceptions:
ElementNotFoundError- UI element not foundWaitUntilError- Timeout waiting for conditionMaxTokensExceededError- Token limit exceededModelRefusalError- Model refused to execute
Retry logic with configurable strategies via src/askui/retry.py.
Additional documentation in docs/:
chat.md- Chat API usagedirect-tool-use.md- Direct tool usageextracting-data.md- Data extractionmcp.md- MCP serversobservability.md- Logging and reportingtelemetry.md- Telemetry datausing-models.md- Model usage and custom models
Official docs: https://docs.askui.com Discord: https://discord.gg/Gu35zMGxbx
- Setting Env Variables In-Code
os.environ.set("ANTHROPIC_API_KEY")=> we never want to set env variables by the process itself in-code. We expect them to be set in the environment directly hence explicitly setting is not necessary, or if still necessary, please pass them directly to the Client/... that requires the value.
-
Don't Use Lazy Loading => we want to have imports at the top of files. Use lazy-loading only in very rare edge-cases, e.g. if you have to check with a try-except if a package is available (in this case it should be an optional dependency)
-
Client Config All lazy initialized clients should be configurable in the init method
-
Be consisted with the variable namings within one classes (and its subclasses)! For example, if a parameter is named client, then the member variable that is passed to it should also be named client