Skip to content

feat: support multimodal tool outputs (text + image)#4954

Open
tpirc3 wants to merge 6 commits intolivekit:mainfrom
tpirc3:feature/multimodal-tool
Open

feat: support multimodal tool outputs (text + image)#4954
tpirc3 wants to merge 6 commits intolivekit:mainfrom
tpirc3:feature/multimodal-tool

Conversation

@tpirc3
Copy link

@tpirc3 tpirc3 commented Feb 26, 2026

Summary

  • extend FunctionCallOutput.output to support ImageContent and list[str | ImageContent]
  • add shared normalization/splitting/text-fallback helpers for tool outputs
  • implement provider-specific handling for tool-result images:
    • native: OpenAI Responses, Anthropic, Google, AWS standard LLM path
    • fallback to text placeholder for unsupported paths (OpenAI chat default, Mistral, realtime variants)
  • add optional OpenAI chat-completions flag supports_tool_image_output to support Qwen-compatible providers that accept tool image content
  • fix telemetry/span/type assumptions by converting multimodal outputs to text where string-only sinks are required

Why

This PR implements the approach proposed in #4893 for multimodal tool outputs.

Concretely, it carries the issue proposal through the shared normalization layer and provider formatters, using native image tool-output payloads where supported and explicit text fallback where not.

Resolves #4893.

Tests

  • make check
  • uv run pytest tests/test_tool_output_multimodal.py tests/test_chat_ctx.py tests/test_tools.py -q

@CLAassistant
Copy link

CLAassistant commented Feb 26, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

@tpirc3 tpirc3 closed this Feb 26, 2026
@tpirc3 tpirc3 reopened this Feb 26, 2026
@tpirc3 tpirc3 force-pushed the feature/multimodal-tool branch from 2cf3d85 to 524d69f Compare February 26, 2026 13:26
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@tpirc3 tpirc3 force-pushed the feature/multimodal-tool branch from 1ec2497 to de38547 Compare February 26, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: support ImageContent in tool return value

2 participants