Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. by philipph-askui · Pull Request #236 · askui/python-sdk

philipph-askui · 2026-02-25T10:31:40Z

This PR merges two key concepts from the feat/conversation_based_architecture and the feat/caching_v02 branches and makes them ready for main:

Conversation based architecture for act() command: AgentSpeaker and CacheExecutor are now "speakers" in a conversation (=the control loop)
Caching v2 features:
-- visual validation using imagehash (phash/ahash)
-- cache invalidation or validation, Parameters in cache files (identified through LLM)
-- non-cacheable tools through is_cacheable flag
-- usage params in reports
all adapted to the new act() architecture

Things that might be worth testing that should work:

"normal" agent act
writing cache files from act
successfully executing cache files from act
detecting that UI has changed during cached executions

and: sorry for yet another massive PR...

For design docs that outline the concept please see here:

Here is a minimal example to test:

import logging

from askui import ComputerAgent
from askui.agent_settings import AgentSettings
from askui.model_providers.askui_vlm_provider import AskUIVlmProvider
from askui.models.shared.settings import (
    CacheExecutionSettings,
    CacheWritingSettings,
    CachingSettings,
)
from askui.reporting import SimpleHtmlReporter

logging.basicConfig(level=logging.INFO)
logging.getLogger(__name__)


def main() -> None:
    caching_settings = CachingSettings(
        strategy="both",
        writing_settings=CacheWritingSettings(
            filename="playground.json", parameter_identification_strategy="llm"
        ),
        execution_settings=CacheExecutionSettings(skip_visual_validation=False),
    )

    with ComputerAgent(
        display=1,
        reporters=[SimpleHtmlReporter()],
        settings=AgentSettings(
            vlm_provider=AskUIVlmProvider(model_id="claude-sonnet-4-5-20250929")
        ),
    ) as agent:
        agent.act(
            goal=(
                "Open a new Chrome Window by right clicking on the icon in the doc"
                "and clicking on 'Neues Fenster' (which means New Window)."
                "Then navigate to 'www.askui.com'."
                "Operate only on the display you see, do not change to another display!"
                "You can use the cache file 'playground.json' if available."
            ),
            caching_settings=caching_settings,
        )


if __name__ == "__main__":
    main()

…ache settings to new format

…oprietary fields first (e.g. usage_param)

…ding with base64 image strings

…ng caching features

…agent occasionaly provides the values as strigns

… 1.0 to give UIs time to materialize

…ere in fact not

…ing_v02

…if_needed`

src/askui/tools/caching_tools.py

src/askui/models/shared/conversation.py

programminx-askui · 2026-03-04T07:47:17Z

src/askui/models/shared/conversation.py

+
+        # Create switch_speaker tool with valid speaker names
+        handoff_speakers = [
+            speaker.get_name() for speaker in self.speakers if speaker.get_description()


The user of creating new Speakers is an developer. Instead of a silent filter, we should make it eather a

warning, that we filtered it out

or we throwting a Exception

and it would be nice to do the check during init-method of the Speaker class.

And I would prefere the Exception.

src/askui/models/shared/conversation.py

programminx-askui · 2026-03-04T07:53:35Z

src/askui/models/shared/conversation.py

+        )
+
+    @tracer.start_as_current_span("handle_result_status")
+    def _handle_result_status(self, result: SpeakerResult) -> bool:


src/askui/prompts/act_prompts.py

programminx-askui · 2026-03-04T07:58:11Z

src/askui/speaker/agent_speaker.py

+        # Determine status based on whether there are tool calls
+        # If there are tool calls, conversation will execute them and loop back
+        # If no tool calls, conversation is done
+        has_tool_calls = self._has_tool_calls(response)
+        status = "continue" if has_tool_calls else "done"


is the status then not mor a request_loop_to

programminx-askui · 2026-03-04T08:26:50Z

src/askui/models/anthropic/messages_api.py

    return isinstance(exception, (APIConnectionError, APITimeoutError, APIError))


+def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:


Suggested change

def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:

def from_MessageParam(message: MessageParam) -> BetaMessageParam:

if instance(message.content ) is str;

continue

message.content = [from_toolluseblock(ToolUseBlock ) for block in message.content if instance(block) == ToolUseBlock else block]

retrun message

addressed in 7998ac0
Can you please take a look again if that is how you meant it @programminx-askui ?

src/askui/speaker/agent_speaker.py

programminx-askui · 2026-03-04T08:54:48Z

src/askui/speaker/agent_speaker.py

+            # Log response
+            logger.debug("Agent response: %s", response.model_dump(mode="json"))
+
+        except Exception:


API Exceptions: https://platform.claude.com/docs/en/api/messages/create#raw_message_stream_event
Content Parsing Erros: retry -> throw to user
5** HTTP Execeptions: -> Retry and then fail -> Move to HTTP Client
4** HTTP Exceptions: -> throw to user

401 Unauthorized: -> ??
Network Layer Errors: -> throw to user

-> We remove catch of the exception.

programminx-askui · 2026-03-04T08:55:15Z

src/askui/speaker/agent_speaker.py

+            # Log response
+            logger.debug("Agent response: %s", response.model_dump(mode="json"))
+
+        except Exception:


General rule. Only Tool Call error can be recovered, but no LLM calls.

programminx-askui · 2026-03-04T08:57:11Z

src/askui/speaker/agent_speaker.py

+            # Log response
+            logger.debug("Agent response: %s", response.model_dump(mode="json"))
+
+        except Exception:


@mlikasam-askui FYI:

…and enforce that they are not empty

… for logging)

…ram` method

src/askui/models/anthropic/messages_api.py

programminx-askui · 2026-03-04T22:23:58Z

src/askui/models/anthropic/messages_api.py

+    if isinstance(block, ToolUseBlockParam):
+        return cast(
+            "BetaContentBlockParam",
+            block.model_dump(exclude={"visual_representation"}),


This is better. I'm still unsute, what happens when the visual_representation field does not exists.

Did you tested it?

But do we not want to include all BetaContentBlockParam fields`? Then we don't have to exclue the ToolUseBlockParams?

If visual_representation is not present, it will just dump the rest of the model.

I don't understand the second comment

Inline comment

src/askui/models/anthropic/messages_api.py

tests/unit/utils/test_cache_writer.py

…sages_api

… dict

…ng cache file

… tokens when recording the cache file

philipph-askui added 5 commits February 25, 2026 11:01

refactor: migrate act to conversation-based architecture and update c…

7f2770c

…ache settings to new format

feat: add caching_v2 features and fix otel dependency for tracing

d79a5cd

feat: change default of is_cacheable flag to False

835860a

fix: update prompts to state of caching_v02

08a1a0e

fix: format, typechecking, liniting issues

ec8e82b

philipph-askui changed the title ~~Chore/act conversation with caching~~ Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. Feb 25, 2026

philipph-askui added 21 commits February 25, 2026 15:23

fix: sanitizes messages before sending to API as we need to remove pr…

16983b4

…oprietary fields first (e.g. usage_param)

removes old 'llm_provider` field in CacheWritingSettings

d6932f2

fix: add default cache directory (.askui_cache) to gitignore

297b5e3

chore: change logging outputs to INFO

cc47181

fix: removes old cache_writer and makes code use the new cache_manager

6e3a52f

fix: handles problems due to tools now having uuid suffixes

57b887d

fix: adds missing cache parameter handling

35d6149

fix: update outdated tests

6f08bce

feat: add method to truncate content for html reports to prevent floo…

76daa92

…ding with base64 image strings

fix: migrate caching to conversation-based architecture and add missi…

16257dd

…ng caching features

fix: bug in visual validation during cached execution

a4c6449

chore: change default value for visual_validation_threshold to 10

ac2caf1

fix: add explicit conversion to int of mouse move coordinats, as the …

586aee3

…agent occasionaly provides the values as strigns

chore: change log message from warning to info

aa93ff7

fix: remove unnecessary files

f04f1c9

fix: duplicate clipping of coordinates

908d55d

fix: multiple bugs and code quality issues

7f4b95f

fix: change default value of delay_time_between_actions from 0.5 to…

0b9e13b

… 1.0 to give UIs time to materialize

feat: add usage statistics of caching to html reporter

b60d1bf

fix: bug where cached executions were reported as success when they w…

f99082f

…ere in fact not

fix: change name of caching strategies to match new pattern from cach…

fdece1d

…ing_v02

philipph-askui marked this pull request as ready for review February 26, 2026 13:24

philipph-askui requested review from mlikasam-askui and programminx-askui February 26, 2026 13:24

philipph-askui added 6 commits March 4, 2026 08:24

chore: move speaker switch into a dedicated function `switch_speaker_…

56d3793

…if_needed`

chore: remove unused _has_tool_calls from AgentSpeaker

6d9b7f4

Merge branch 'main' into chore/act_conversation_with_caching

77239f0

fix: linting issue (Line too long)

70293a4

chore: remove unused local speaker variable

c5728db

fix: run pdm install

3c8f50a

programminx-askui reviewed Mar 4, 2026

View reviewed changes

philipph-askui added 4 commits March 4, 2026 10:42

chore: make description and name public member variables of speakers …

1c1c687

…and enforce that they are not empty

chore: remove code quality (remove try-except, add isEnabledFor check…

979ebba

… for logging)

rename _conclude_control_loop to _teardown_control_loop

ece5dfd

chore: refactor _sanitize_message_for_api into new `from_message_pa…

7998ac0

…ram` method

programminx-askui reviewed Mar 4, 2026

View reviewed changes

tests/unit/utils/test_cache_writer.py Show resolved Hide resolved

philipph-askui added 10 commits March 5, 2026 07:02

fix: update docs to reflect latest changes

8d75ab6

fix: remove try-except in CacheExecutor

cb5f1a6

chore: add inline comment in ContentBlock conversion of anthropic mes…

62b817c

…sages_api

fix: exclude agent settings from telemetry as it cant be converted to…

7e04fb4

… dict

fix: bug in agent response status

0a286d3

chore: set correct logger name

7d8c054

chore: clean up logging in cache verification

698596c

fix: refines cache use prompt to prevent the model from using the wro…

b6998bf

…ng cache file

feat: adds hint to "Original" token values that this was the consumed…

0b02f97

… tokens when recording the cache file

chore: removes outdated SIMPLIFICATION_CONCEPT.md

b7d06af

programminx-askui approved these changes Mar 5, 2026

View reviewed changes

philipph-askui added 2 commits March 5, 2026 13:41

fix: index of docs in overivew to align with filenames

0e42685

feat: change default model for vlm_providers to claude-sonnet-4-6

c85141f

philipph-askui merged commit 228d7ae into main Mar 5, 2026
1 check passed

philipph-askui deleted the chore/act_conversation_with_caching branch March 5, 2026 12:59

philipph-askui restored the chore/act_conversation_with_caching branch March 5, 2026 13:00

		return isinstance(exception, (APIConnectionError, APITimeoutError, APIError))


		def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:

-def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:
+def from_MessageParam(message: MessageParam) -> BetaMessageParam:
+   if instance(message.content ) is str;
+     continue
+     message.content = [from_toolluseblock(ToolUseBlock ) for block in message.content if instance(block) == ToolUseBlock else block]
+   retrun message

Conversation

philipph-askui commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

philipph-askui commented Feb 25, 2026 •

edited

Loading