CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AskUI Vision Agent is a Python desktop and mobile automation framework that enables AI agents to control computers (Windows, macOS, Linux), mobile devices (Android, iOS), and HMI systems. It supports both programmatic UI automation (RPA-like single-step commands) and agentic intent-based instructions using vision/computer vision models.

Tech Stack: Python 3.10+, Pydantic 2, Anthropic SDK, OpenTelemetry, Model Context Protocol (MCP), PDM

Common Commands

Development Setup

# Install dependencies
pdm install

Testing

# Run all tests (parallel execution)
pdm run test

# Run specific test suites
pdm run test:unit          # Unit tests only
pdm run test:integration   # Integration tests only
pdm run test:e2e          # End-to-end tests only

# Run tests with coverage
pdm run test:cov          # All tests with coverage report
pdm run test:cov:view     # View coverage report in browser

Code Quality

# Quick QA: type check, format, and fix linting issues (run before commits)
pdm run qa:fix

# Individual commands
pdm run typecheck:all     # Type checking with mypy
pdm run format            # Format code with ruff
pdm run lint              # Lint code with ruff
pdm run lint:fix          # Auto-fix linting issues

Code Generation

# Regenerate gRPC client code from .proto files
pdm run grpc:gen

# Regenerate Pydantic models from JSON schemas
pdm run json:gen

High-Level Architecture

Core SDK Architecture

ComputerAgent (Main SDK Entry Point)
    ↓
Agent (Abstract base class for all agents)
    ├── ComputerAgent (Desktop automation)
    ├── AndroidAgent (Mobile Android automation)
    ├── WebVisionAgent (Web-specific automation)
    └── WebTestingAgent (Web testing framework)

    Uses:
    ├── ModelRouter → Model selection/composition
    ├── AgentToolbox → Tool & OS abstraction
    └── Locators → UI element identification

Key Flow:

User calls agent.click("Submit button") on ComputerAgent
AgentBase.locate() routes to appropriate model via ModelRouter
Model receives screenshot + locator → returns coordinates
AgentToolbox.os.click() → gRPC call to Agent OS
Agent OS performs actual mouse click

Chat API Architecture

FastAPI Chat API (Experimental)
    ├── Assistants (AI agent configurations)
    ├── Threads (Conversation sessions)
    ├── Messages (Chat history)
    ├── Runs (Agent execution iterations)
    ├── Files (Attachments & resources)
    ├── MCP Configs (Tool providers)
    └── Workflows & Scheduled Jobs (Automation triggers)

Key Flow:

User → Chat UI (hub.askui.com) → Chat API (FastAPI)
Thread/Messages stored in SQLAlchemy database
Runs execute agent steps in a loop
Agent uses ModelRouter → Tools (MCP servers or direct) → AgentOS

Model Router & Composition

The ModelRouter provides a flexible abstraction for AI model selection:

# Single model for all tasks
model = "askui"

# Task-specific models (ActModel, GetModel, LocateModel)
model = {
    "act": "claude-sonnet-4-20250514",
    "get": "askui",
    "locate": "askui-combo"
}

# Custom registry
models = ModelRegistry()
models.register("my-model", custom_model_instance)

Supported Model Providers:

AskUI Models (Primary - internally hosted)
Anthropic Claude (Computer Use, Messages API)
Google Gemini (via OpenRouter)
Hugging Face Spaces (Community models)

Agent OS Abstraction

AgentOs provides an abstraction layer for OS-level operations:

AgentOs (Abstract Interface)
    ├── AskUiControllerClient (gRPC to AskUI Agent OS - primary)
    ├── PlaywrightAgentOs (Web browser automation)
    └── AndroidAgentOs (Android ADB)

Locator System

Locators identify UI elements in multiple ways:

Text: Match by text content (exact/similar/contains/regex)
Image: Match by image file or base64
Prompt: Natural language description
Coordinate: Absolute (x, y) position
Relatable: Positional relationships (right_of, below, etc.)

Serialization differs by model type (VLM vs. traditional).

Tool System (MCP)

Tools follow the Model Context Protocol (MCP) for extensibility:

Tools (MCP Servers)
    ├── Computer: screenshot, click, type, mouse, clipboard
    ├── Android: device control via ADB
    ├── Testing: scenario & feature management
    └── Utility: file ops, data extraction

Tools are auto-discovered and can be dynamically loaded via MCP configurations.

Key Code Locations

Core SDK

src/askui/agent.py - Main ComputerAgent class (user-facing API)
src/askui/agent_base.py - Abstract Agent (base) with shared agent logic
src/askui/android_agent.py - Android-specific agent
src/askui/web_agent.py - Web-specific agent

Models & AI

src/askui/models/ - AI model providers & router factory
src/askui/models/shared/ - Shared abstractions (Agent, Tool, MessagesApi)
src/askui/models/{provider}/ - Provider implementations
src/askui/prompts/ - System prompts for different models

Tools & OS

src/askui/tools/agent_os.py - Abstract AgentOs interface
src/askui/tools/askui/ - gRPC client for AskUI Agent OS
src/askui/tools/android/ - Android-specific tools
src/askui/tools/playwright/ - Web automation tools
src/askui/tools/mcp/ - MCP client/server implementations
src/askui/tools/testing/ - Test scenario tools

Locators

src/askui/locators/ - UI element selectors
src/askui/locators/serializers.py - Locator serialization for models

Chat API

src/askui/chat/ - FastAPI-based Chat API
src/askui/chat/api/ - REST API routes
src/askui/chat/migrations/ - Alembic migrations & ORM models

Utilities

src/askui/utils/ - Image processing, API utilities, caching, annotations
src/askui/reporting.py - Reporting & logging
src/askui/retry.py - Retry logic with exponential backoff
src/askui/telemetry/ - OpenTelemetry tracing & analytics

Code Style & Conventions

General Python Style

Private members: Use _ prefix for all private variables, functions, methods, etc. Mark everything private that doesn't need external access.
Type hints: Required everywhere. Use built-in types (list, dict, str | None) instead of typing module types (List, Dict, Optional).
Overrides: Use @override decorator from typing_extensions for all overridden methods.

Exceptions: Never pass literals to exceptions. Assign to variables first:

# Good
error_msg = f"Thread {thread_id} not found"
raise FileNotFoundError(error_msg)

# Bad
raise FileNotFoundError(f"Thread {thread_id} not found")

File operations: Always specify encoding="utf-8" for file read/write operations.
Init files: Create __init__.py in each folder.

FastAPI Specific

Use response type in function signature instead of response_model in route annotation.
Dependencies without defaults should come before arguments with defaults.

Testing

Use pytest-mock for mocking wherever possible.
Test files in tests/ follow structure: test_*.py with Test* classes and test_* functions.
Timeout: 60 seconds per test (configured in pyproject.toml).

Git Conventions

Never use git add . - explicitly add files related to the task.
Use conventional commits format: feat:, fix:, docs:, style:, refactor:, test:, chore:.
Before committing, always run: pdm run qa:fix (or individually: typecheck:all, format, lint:fix).

Docstrings

All public functions, classes, and constants require docstrings.
Document constructor args in class docstring, omit __init__ docstring.
Use backticks for code references (variables, types, functions).
Function references: click(), Class references: ComputerAgent, Method references: VisionAgent.click()
Include sections: Args, Returns, Raises, Example, Notes, See Also as needed.
Document parameter types in parentheses, add , optional for defaults.

Documentation (docs/)

When writing or updating documentation in docs/:

Never show setting environment variables in Python code (e.g., os.environ["ASKUI_WORKSPACE_ID"] = "..."). This is bad practice. Always instruct users to set environment variables via their shell or system settings.
Keep examples concise and focused on the feature being documented.
Test all code examples before including them.
Use ComputerAgent (not VisionAgent) in examples.

Important Patterns

Composition over Inheritance

AgentToolbox wraps AgentOs implementations
ModelRouter composes multiple model providers
CompositeReporter aggregates multiple reporters

Factory Pattern

ModelRouter.initialize_default_model_registry() creates model registry
Model providers use factory functions for lazy-loading

Strategy Pattern

Truncation strategies for message history
Different locator serializers for model types
Retry strategies with exponential backoff

Adapter Pattern

AgentOs abstraction bridges OS implementations (gRPC, Playwright, ADB)
ModelFacade adapts models to ActModel/GetModel/LocateModel interfaces

Dependency Injection

Constructor-based DI throughout
FastAPI dependencies for Chat API routes
@auto_inject_agent_os decorator for tools

Template Method Pattern

Agent._step() orchestrates tool-calling loop
Agent provides common structure for all agents

Database & Observability

Alembic Migrations

Schema versioning in src/askui/chat/migrations/
ORM models in migrations/shared/{entity}/models.py
Auto-migration on startup (configurable)
SQLAlchemy with async support

Telemetry

OpenTelemetry integration (FastAPI, HTTPX, SQLAlchemy)
Structured logging with structlog
Correlation IDs for request tracing
Prometheus metrics via FastAPI instrumentator
Segment Analytics for usage tracking

Extending the Framework

Adding Custom Models

Inherit from ActModel, GetModel, or LocateModel
Implement message creation via MessagesApi
Register in ModelRegistry
Use appropriate locator serializer

Adding Custom Tools

Implement Tool protocol in models/shared/tools.py
Register in appropriate MCP server (api/mcp_servers/{type}.py)
Use @auto_inject_agent_os for AgentOs dependency
Follow Pydantic schema validation

Adding New Agent Types

Inherit from Agent
Implement required abstract methods
Provide appropriate AgentOs implementation
Register in agent factory if needed

Performance & Caching

Screenshot caching for multi-step operations
Token counting before API calls
Cached trajectory execution (replay previous interactions)
Image downsampling & compression
Lazy model initialization (@functools.cache)

Error Handling

Custom exceptions:

ElementNotFoundError - UI element not found
WaitUntilError - Timeout waiting for condition
MaxTokensExceededError - Token limit exceeded
ModelRefusalError - Model refused to execute

Retry logic with configurable strategies via src/askui/retry.py.

Documentation References

Additional documentation in docs/:

chat.md - Chat API usage
direct-tool-use.md - Direct tool usage
extracting-data.md - Data extraction
mcp.md - MCP servers
observability.md - Logging and reporting
telemetry.md - Telemetry data
using-models.md - Model usage and custom models

Official docs: https://docs.askui.com Discord: https://discord.gg/Gu35zMGxbx

Conding Standards

Anti-Patterns and Bad Examples

Setting Env Variables In-Code

os.environ.set("ANTHROPIC_API_KEY")

=> we never want to set env variables by the process itself in-code. We expect them to be set in the environment directly hence explicitly setting is not necessary, or if still necessary, please pass them directly to the Client/... that requires the value.

Don't Use Lazy Loading => we want to have imports at the top of files. Use lazy-loading only in very rare edge-cases, e.g. if you have to check with a try-except if a package is available (in this case it should be an optional dependency)
Client Config All lazy initialized clients should be configurable in the init method
Be consisted with the variable namings within one classes (and its subclasses)! For example, if a parameter is named client, then the member variable that is passed to it should also be named client

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Common Commands

Development Setup

Testing

Code Quality

Code Generation

High-Level Architecture

Core SDK Architecture

Chat API Architecture

Model Router & Composition

Agent OS Abstraction

Locator System

Tool System (MCP)

Key Code Locations

Core SDK

Models & AI

Tools & OS

Locators

Chat API

Utilities

Code Style & Conventions

General Python Style

FastAPI Specific

Testing

Git Conventions

Docstrings

Documentation (docs/)

Important Patterns

Composition over Inheritance

Factory Pattern

Strategy Pattern

Adapter Pattern

Dependency Injection

Template Method Pattern

Database & Observability

Alembic Migrations

Telemetry

Extending the Framework

Adding Custom Models

Adding Custom Tools

Adding New Agent Types

Performance & Caching

Error Handling

Documentation References

Conding Standards

Anti-Patterns and Bad Examples