feat: add audio and video context#701
Conversation
Add audio/video context config models and canonical media helpers. Translate canonical media blocks for OpenAI-compatible clients while preserving URL media as URLs. Reject unsupported audio/video blocks in the Anthropic adapter. Refs #671
Preserve extensionless HTTP(S) audio and video URLs as URL media, reject local path-looking audio/video context values, and reject provider-specific audio/video blocks in the Anthropic adapter. Refs #671
Add a Jupytext source notebook and generated Colab artifact that exercise audio/video context URL, base64, local path rejection, OpenAI-compatible payload translation, and Anthropic unsupported-media handling. Refs #671
Rewrite the audio/video smoke notebook to run a full Data Designer preview against a local OpenAI-compatible HTTP server. Assert the generated dataset, captured endpoint payload, URL/base64 translation, and local path rejection through the interface pipeline. Refs #671
Move the generated audio/video context E2E notebook out of the PR docs surface and keep it locally under the main checkout's .scratch directory. Refs #671
Remove unused URL-specific media helpers, share the base64 data URI parser in Anthropic translation, align AudioContext validation messaging, and update config docs for audio/video contexts. Refs #671
Code Review — PR #701: feat: add audio and video contextSummaryAdds first-class audio/video context support alongside the existing image-context path. The change introduces:
The code respects the layering rule (engine→config only; new adapter imports of FindingsBackward compatibility
Correctness
Tests
Style / conventions
Docs
Risks / suggestions
Structural ImpactNo structural impact analysis was provided for this PR (no VerdictSolid feature work. The canonical-block abstraction is a clean replacement for the ad-hoc image-context shape and pays dividends as the second/third modality lands. Discriminated unions, format validation, and adapter-side rejection of unsupported modalities are all in the right places, and test coverage is thorough. Recommend approve with minor follow-ups (run full test suite, add a changelog note about the runtime payload shape change, optionally tighten the comments around non-standard |
Greptile SummaryThis PR introduces first-class
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/utils/media_helpers.py | New unified media helper module consolidating image utilities with new audio/video enums, MIME-type maps, path-detection helpers, and canonical block builders. Logic is well-factored and consistent across all three modalities. |
| packages/data-designer-config/src/data_designer/config/models.py | Adds AudioContext and VideoContext ModalityContext subclasses with URL/base64 resolution logic, format validation, and a MultiModalContextT discriminated union. AudioContext and VideoContext intentionally do not resolve local files to base64 (only ImageContext does). |
| packages/data-designer-config/src/data_designer/config/column_configs.py | multi_modal_context fields widened to list[MultiModalContextT]; legacy image-context dicts without a modality key are migrated with a DeprecationWarning unless audio/video-specific keys are present, which correctly defers them to the discriminated union validator. |
| packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/openai_compatible.py | New translate_openai_compatible_messages layer correctly handles canonical image→image_url, audio→input_audio (base64) or audio_url (URL), and video→video_url in all four send paths. Unknown block types and provider-specific block types are passed through unchanged. |
| packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/anthropic_translation.py | Adds translate_canonical_image_block for new canonical image blocks and raises UnsupportedAnthropicMediaBlockError for all audio/video block types (both canonical and provider-specific variants). Ordering of the specific exception catch before the generic ValueError handler is correct. |
| packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/anthropic.py | Catches UnsupportedAnthropicMediaBlockError before the generic ValueError handler and converts it to ProviderError.unsupported_capability with modality details. Ordering is correct since the specific exception is a ValueError subclass. |
| packages/data-designer-config/src/data_designer/config/init.py | Public API updated to export AudioContext, VideoContext, AudioFormat, and VideoFormat; ImageFormat lazy-import path updated from image_helpers to media_helpers. |
| packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py | New tests cover canonical image/audio/video translation, passthrough of provider-specific block types, and ProviderError propagation for translation failures. |
| packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py | Tests verify canonical image block translation and that all six audio/video block type variants raise UnsupportedAnthropicMediaBlockError. |
Sequence Diagram
sequenceDiagram
participant User
participant ContextModel as AudioContext / VideoContext / ImageContext
participant Generator as ColumnGenerator
participant OpenAI as OpenAICompatibleClient
participant Anthropic as AnthropicClient
User->>ContextModel: configure multi_modal_context
Generator->>ContextModel: get_contexts(record, base_path)
ContextModel-->>Generator: "canonical blocks {type: audio|video|image, source: {...}}"
Generator->>OpenAI: ChatCompletionRequest(messages with canonical blocks)
OpenAI->>OpenAI: translate_openai_compatible_messages()
Note over OpenAI: image → image_url<br/>audio base64 → input_audio<br/>audio url → audio_url<br/>video → video_url
OpenAI-->>User: API response
Generator->>Anthropic: ChatCompletionRequest(messages with canonical blocks)
Anthropic->>Anthropic: build_anthropic_payload() → translate_content_blocks()
Note over Anthropic: image_url → Anthropic image<br/>canonical image → Anthropic image<br/>audio/video → UnsupportedAnthropicMediaBlockError
Anthropic-->>User: ProviderError.unsupported_capability (audio/video)
Reviews (8): Last reviewed commit: "align media local path autodetection" | Re-trigger Greptile
johnnygreco
left a comment
There was a problem hiding this comment.
Nice work on this one, @nabinchha — the canonical media helper direction is a clean way to keep provider shapes out of config.
Summary
PR 701 adds first-class AudioContext and VideoContext alongside the existing ImageContext, migrates config-produced media blocks to a canonical shape, and teaches the OpenAI-compatible and Anthropic adapters how to translate or reject those blocks. This mostly matches issue #671 and the #668 plan, but I found a provider capability edge and a legacy-config ambiguity worth tightening before merge.
Findings
Warnings — Worth addressing
packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/openai_compatible.py:225 — Audio/video URL and video blocks are sent without capability gating
- What:
_translate_canonical_audio_block()maps every URL audio source to{"type": "audio_url", ...}, and_translate_canonical_video_block()maps every URL/base64 video source to{"type": "video_url", ...}for all OpenAI-compatible providers/models. There is no provider/model/source-type gate before transport. - Why: Issue #671 asks unsupported providers/routes to raise canonical unsupported-capability errors, and the #668 plan calls out checking modality, source type, and media type before sending. OpenAI Chat Completions defines base64 audio via
input_audio, while URL audio andvideo_urlare not generally supported across OpenAI-compatible endpoints. With the current code, a user can configureAudioContext(data_type=URL)or anyVideoContextagainst the default OpenAI provider and get a provider 400 after the HTTP request instead of aProviderErrorKind.UNSUPPORTED_CAPABILITYbefore transport. - Suggestion: Add an adapter-level media capability policy before translating/sending blocks. For example, allow base64 audio ->
input_audiowhere supported, allowaudio_url/video_urlonly for known provider/model routes that accept those parts, and otherwise raiseProviderError.unsupported_capability(...). It would be good to add tests that assert unsupported URL-audio/video cases do not callpost().
packages/data-designer-config/src/data_designer/config/column_configs.py:189 — Modality-less audio/video URL dicts silently become ImageContext
- What: The legacy migration treats any dict without
modalityas an image context. That preserves old image configs, but a new dict config like{"column_name": "audio_url", "data_type": "url"}is accepted asImageContext(column_name="audio_url", data_type="url")instead of failing as a missing-modality audio context. - Why: This is easy to miss because URL-backed audio/video contexts do not need
audio_formatorvideo_format, so there is no extra field to force validation to fail. The result is that an audio/video URL can be sent through the image path asimage_url, which is a confusing runtime failure and diverges from the plan’s note that new audio/video dict configs must declaremodalityexplicitly. - Suggestion: Could we make this ambiguity explicit? Options include adding a caller warning whenever a modality-less dict is migrated to image, documenting that modality-less dicts are always legacy image configs, or introducing a stricter/deprecated migration path. At minimum, I’d add a regression test for modality-less URL audio/video dicts so the chosen behavior is intentional.
Suggestions — Take it or leave it
packages/data-designer-config/src/data_designer/config/models.py:150 — Validate image data URI format against image_format
- What: Audio and video data URIs raise when the configured format conflicts with the URI media type, but image data URIs currently bypass that check and trust the URI media type even when
image_formatis set differently. - Why: It is a small consistency gap in the new shared media treatment, and mismatched config can be hard to diagnose once it reaches a provider.
- Suggestion: Mirror the audio/video behavior: infer the image format from the data URI media type and raise if it conflicts with
self.image_format.
What Looks Good
- The canonical
{"type": modality, "source": ...}block shape keeps config provider-neutral while preserving the existing image behavior after adapter translation. - The Anthropic path is careful about continuing to accept legacy
image_urlblocks while raising a canonical unsupported-capability error for audio/video. - The focused tests cover scalar/list/JSON/array normalization, local-path rejection for audio/video, mixed-context ordering, and image path resolution, which are exactly the edge cases most likely to regress.
Verdict
Needs changes — I’d address the OpenAI-compatible capability gate before merge, and decide intentionally how strict the legacy modality migration should be for new audio/video dict configs.
This review was generated by an AI assistant.
…dio-video-context # Conflicts: # docs/code_reference/config/models.md # fern/versions/latest/pages/code_reference/config/models.mdx
|
MkDocs preview: https://22e174d7.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-701.docs.buildwithfern.com/nemo/datadesigner
|
|
Thanks, Nabin. I’m good with the resolution on the original review threads:
One thing I’d still like to resolve before merge: the latest branch now supports local audio/video paths by passing them through unchanged. That seems outside issue #671’s stated scope, which explicitly excluded local path handling for new audio/video contexts. It also creates a different behavior from image context, where local paths are resolved against Could we either:
I’m fine with the no-preflight OpenAI-compatible decision after that clarification, but I’d treat the local-path scope change as the remaining merge blocker. |
|
@johnnygreco update/correction after checking the exact The branch now mirrors image auto-detect semantics more closely: in auto mode, local-looking audio/video paths are not implicitly passed through. They are rejected with guidance to use So the boundary is now:
|
johnnygreco
left a comment
There was a problem hiding this comment.
@nabinchha thanks for tightening this up. I rechecked f482dfb7 and the implementation now matches the boundary you described and the updated #671 scope:
- auto mode only auto-detects HTTP(S) media URLs and base64/data URI values
- local-looking audio/video paths are rejected in auto mode with guidance to use
data_type=url - explicit
data_type=urlstill passes local paths /file://values through unchanged for colocated endpoints - audio/video still do not do
base_pathlookup, file reads, base64 conversion, or upload/file-ID lifecycle
I also reran the focused config model suite locally: packages/data-designer-config/tests/config/test_models.py passed (56 passed). My original review concerns are resolved. Approving from code review; merge should still wait for the remaining CI jobs to finish green.
📋 Summary
Adds first-class audio and video context support alongside the existing image context path. The branch introduces canonical media context helpers, translates supported media blocks for OpenAI-compatible providers, rejects unsupported audio/video routes with canonical errors, and updates docs to make model modality support explicit.
🔗 Related Issue
Closes #671
🔄 Changes
AudioContextandVideoContextconfig models with URL/base64 support, explicit media format validation, and shared media helper utilities.multi_modal_contexthandling for LLM text/structured columns and autoregressive image columns while preserving legacy image-context config compatibility.🧪 Testing
make testpasses (not run; focused suite run instead)PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer-config/tests/config/test_models.py packages/data-designer-config/tests/config/test_columns.py packages/data-designer-config/tests/config/utils/test_media_helpers.py packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic.py packages/data-designer-engine/tests/engine/models/test_model_utils.py packages/data-designer-engine/tests/engine/column_generators/generators/test_image.py packages/data-designer-engine/tests/engine/test_validation.py -q(280 passed)uv run --group docs --group notebooks python -m py_compile docs/notebook_source/4-providing-images-as-context.pydocs/colab_notebooks/4-providing-images-as-context.ipynbvalidated withnbformatgit diff --checkfern check(not run; localfernCLI is not installed)✅ Checklist