feat: support multimodal tool outputs (text + image) by tpirc3 · Pull Request #4954 · livekit/agents

tpirc3 · 2026-02-26T11:53:33Z

Summary

extend FunctionCallOutput.output to support ImageContent and list[str | ImageContent]
add shared normalization/splitting/text-fallback helpers for tool outputs
implement provider-specific handling for tool-result images:
- native: OpenAI Responses, Anthropic, Google, AWS standard LLM path
- fallback to text placeholder for unsupported paths (OpenAI chat default, Mistral, realtime variants)
add optional OpenAI chat-completions flag supports_tool_image_output to support Qwen-compatible providers that accept tool image content
fix telemetry/span/type assumptions by converting multimodal outputs to text where string-only sinks are required

Why

This PR implements the approach proposed in #4893 for multimodal tool outputs.

Concretely, it carries the issue proposal through the shared normalization layer and provider formatters, using native image tool-output payloads where supported and explicit text fallback where not.

Resolves #4893.

Tests

make check
uv run pytest tests/test_tool_output_multimodal.py tests/test_chat_ctx.py tests/test_tools.py -q

CLAassistant · 2026-02-26T11:53:41Z

All committers have signed the CLA.

This comment was marked as resolved.

Sign in to view

tpirc3 closed this Feb 26, 2026

tpirc3 added 5 commits February 26, 2026 20:03

feat: function call output handling to support multimodal outputs

030d313

feat: add supports_tool_image_output arg

cab2382

style: format tool output image changes

48d1016

fix: resolve mypy issues for multimodal tool output

cb2b06d

fix(openai): preserve multimodal tool output ordering

524d69f

tpirc3 reopened this Feb 26, 2026

tpirc3 force-pushed the feature/multimodal-tool branch from 2cf3d85 to 524d69f Compare February 26, 2026 13:26

This comment was marked as resolved.

Sign in to view

fix(llm): normalize falsy and tuple-like tool outputs

de38547

tpirc3 force-pushed the feature/multimodal-tool branch from 1ec2497 to de38547 Compare February 26, 2026 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support multimodal tool outputs (text + image)#4954

feat: support multimodal tool outputs (text + image)#4954
tpirc3 wants to merge 6 commits intolivekit:mainfrom
tpirc3:feature/multimodal-tool

tpirc3 commented Feb 26, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 26, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tpirc3 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Tests

Uh oh!

CLAassistant commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tpirc3 commented Feb 26, 2026 •

edited

Loading

CLAassistant commented Feb 26, 2026 •

edited

Loading