Skip to content

Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features.#236

Merged
philipph-askui merged 65 commits intomainfrom
chore/act_conversation_with_caching
Mar 5, 2026
Merged

Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features.#236
philipph-askui merged 65 commits intomainfrom
chore/act_conversation_with_caching

Conversation

@philipph-askui
Copy link
Contributor

@philipph-askui philipph-askui commented Feb 25, 2026

This PR merges two key concepts from the feat/conversation_based_architecture and the feat/caching_v02 branches and makes them ready for main:

  • Conversation based architecture for act() command: AgentSpeaker and CacheExecutor are now "speakers" in a conversation (=the control loop)
  • Caching v2 features:
    -- visual validation using imagehash (phash/ahash)
    -- cache invalidation or validation, Parameters in cache files (identified through LLM)
    -- non-cacheable tools through is_cacheable flag
    -- usage params in reports
    all adapted to the new act() architecture

Things that might be worth testing that should work:

  • "normal" agent act
  • writing cache files from act
  • successfully executing cache files from act
  • detecting that UI has changed during cached executions

and: sorry for yet another massive PR...

For design docs that outline the concept please see here:

Here is a minimal example to test:

import logging

from askui import ComputerAgent
from askui.agent_settings import AgentSettings
from askui.model_providers.askui_vlm_provider import AskUIVlmProvider
from askui.models.shared.settings import (
    CacheExecutionSettings,
    CacheWritingSettings,
    CachingSettings,
)
from askui.reporting import SimpleHtmlReporter

logging.basicConfig(level=logging.INFO)
logging.getLogger(__name__)


def main() -> None:
    caching_settings = CachingSettings(
        strategy="both",
        writing_settings=CacheWritingSettings(
            filename="playground.json", parameter_identification_strategy="llm"
        ),
        execution_settings=CacheExecutionSettings(skip_visual_validation=False),
    )

    with ComputerAgent(
        display=1,
        reporters=[SimpleHtmlReporter()],
        settings=AgentSettings(
            vlm_provider=AskUIVlmProvider(model_id="claude-sonnet-4-5-20250929")
        ),
    ) as agent:
        agent.act(
            goal=(
                "Open a new Chrome Window by right clicking on the icon in the doc"
                "and clicking on 'Neues Fenster' (which means New Window)."
                "Then navigate to 'www.askui.com'."
                "Operate only on the display you see, do not change to another display!"
                "You can use the cache file 'playground.json' if available."
            ),
            caching_settings=caching_settings,
        )


if __name__ == "__main__":
    main()

@philipph-askui philipph-askui changed the title Chore/act conversation with caching Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. Feb 25, 2026
…agent occasionaly provides the values as strigns
@philipph-askui philipph-askui marked this pull request as ready for review February 26, 2026 13:24

# Create switch_speaker tool with valid speaker names
handoff_speakers = [
speaker.get_name() for speaker in self.speakers if speaker.get_description()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user of creating new Speakers is an developer. Instead of a silent filter, we should make it eather a

  • warning, that we filtered it out

  • or we throwting a Exception

    • and it would be nice to do the check during init-method of the Speaker class.

    And I would prefere the Exception.

)

@tracer.start_as_current_span("handle_result_status")
def _handle_result_status(self, result: SpeakerResult) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope.

Comment on lines +108 to +112
# Determine status based on whether there are tool calls
# If there are tool calls, conversation will execute them and loop back
# If no tool calls, conversation is done
has_tool_calls = self._has_tool_calls(response)
status = "continue" if has_tool_calls else "done"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the status then not mor a request_loop_to

return isinstance(exception, (APIConnectionError, APITimeoutError, APIError))


def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:
def from_MessageParam(message: MessageParam) -> BetaMessageParam:
if instance(message.content ) is str;
continue
message.content = [from_toolluseblock(ToolUseBlock ) for block in message.content if instance(block) == ToolUseBlock else block]
retrun message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 7998ac0
Can you please take a look again if that is how you meant it @programminx-askui ?

# Log response
logger.debug("Agent response: %s", response.model_dump(mode="json"))

except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API Exceptions: https://platform.claude.com/docs/en/api/messages/create#raw_message_stream_event
Content Parsing Erros: retry -> throw to user
5** HTTP Execeptions: -> Retry and then fail -> Move to HTTP Client
4** HTTP Exceptions: -> throw to user

  • 401 Unauthorized: -> ??
    Network Layer Errors: -> throw to user

-> We remove catch of the exception.

# Log response
logger.debug("Agent response: %s", response.model_dump(mode="json"))

except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General rule. Only Tool Call error can be recovered, but no LLM calls.

# Log response
logger.debug("Agent response: %s", response.model_dump(mode="json"))

except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(block, ToolUseBlockParam):
return cast(
"BetaContentBlockParam",
block.model_dump(exclude={"visual_representation"}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better. I'm still unsute, what happens when the visual_representation field does not exists.

Did you tested it?

But do we not want to include all BetaContentBlockParam fields`? Then we don't have to exclue the ToolUseBlockParams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If visual_representation is not present, it will just dump the rest of the model.

I don't understand the second comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comment

@philipph-askui philipph-askui merged commit 228d7ae into main Mar 5, 2026
1 check passed
@philipph-askui philipph-askui deleted the chore/act_conversation_with_caching branch March 5, 2026 12:59
@philipph-askui philipph-askui restored the chore/act_conversation_with_caching branch March 5, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants