Skip to content

feat: add audio and video context#701

Merged
nabinchha merged 16 commits into
mainfrom
nmulepati/feat-671-audio-video-context
May 22, 2026
Merged

feat: add audio and video context#701
nabinchha merged 16 commits into
mainfrom
nmulepati/feat-671-audio-video-context

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha commented May 22, 2026

📋 Summary

Adds first-class audio and video context support alongside the existing image context path. The branch introduces canonical media context helpers, translates supported media blocks for OpenAI-compatible providers, rejects unsupported audio/video routes with canonical errors, and updates docs to make model modality support explicit.

🔗 Related Issue

Closes #671

🔄 Changes

  • Add AudioContext and VideoContext config models with URL/base64 support, explicit media format validation, and shared media helper utilities.
  • Extend multi_modal_context handling for LLM text/structured columns and autoregressive image columns while preserving legacy image-context config compatibility.
  • Translate canonical image/audio/video context blocks for OpenAI-compatible clients and reject unsupported audio/video blocks in the Anthropic adapter.
  • Thread context/base-path handling through generator paths that resolve media context.
  • Add focused config, adapter, generator, validation, and media-helper tests for image/audio/video context behavior.
  • Update tutorial and Fern docs to describe image/audio/video media context and warn that the selected model must support every modality sent.

🧪 Testing

  • make test passes (not run; focused suite run instead)
  • Unit tests added/updated
  • E2E tests added/updated (N/A — no committed E2E artifact)
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer-config/tests/config/test_models.py packages/data-designer-config/tests/config/test_columns.py packages/data-designer-config/tests/config/utils/test_media_helpers.py packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic.py packages/data-designer-engine/tests/engine/models/test_model_utils.py packages/data-designer-engine/tests/engine/column_generators/generators/test_image.py packages/data-designer-engine/tests/engine/test_validation.py -q (280 passed)
  • uv run --group docs --group notebooks python -m py_compile docs/notebook_source/4-providing-images-as-context.py
  • docs/colab_notebooks/4-providing-images-as-context.ipynb validated with nbformat
  • git diff --check
  • Commit hooks passed for the latest docs commit
  • fern check (not run; local fern CLI is not installed)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

nabinchha added 8 commits May 21, 2026 13:19
Add audio/video context config models and canonical media helpers.

Translate canonical media blocks for OpenAI-compatible clients while preserving URL media as URLs. Reject unsupported audio/video blocks in the Anthropic adapter.

Refs #671
Preserve extensionless HTTP(S) audio and video URLs as URL media, reject local path-looking audio/video context values, and reject provider-specific audio/video blocks in the Anthropic adapter.

Refs #671
Add a Jupytext source notebook and generated Colab artifact that exercise audio/video context URL, base64, local path rejection, OpenAI-compatible payload translation, and Anthropic unsupported-media handling.

Refs #671
Rewrite the audio/video smoke notebook to run a full Data Designer preview against a local OpenAI-compatible HTTP server. Assert the generated dataset, captured endpoint payload, URL/base64 translation, and local path rejection through the interface pipeline.

Refs #671
Move the generated audio/video context E2E notebook out of the PR docs surface and keep it locally under the main checkout's .scratch directory.

Refs #671
Remove unused URL-specific media helpers, share the base64 data URI parser in Anthropic translation, align AudioContext validation messaging, and update config docs for audio/video contexts.

Refs #671
@nabinchha nabinchha requested a review from a team as a code owner May 22, 2026 15:03
@github-actions
Copy link
Copy Markdown
Contributor

Code Review — PR #701: feat: add audio and video context

Summary

Adds first-class audio/video context support alongside the existing image-context path. The change introduces:

  • AudioContext and VideoContext config models (packages/data-designer-config/src/data_designer/config/models.py) discriminated by modality, with URL/base64 inputs and explicit format validation.
  • A shared media_helpers utility module that owns canonical media-block builders, format ↔ MIME maps, data-URI parsing, and value normalization. The legacy _DATA_URI_RE and the normalize block in ImageContext are consolidated here.
  • A canonical content-block protocol ({"type": <modality>, "source": {...}}) plumbed end-to-end. Adapters translate canonical blocks to provider-specific shapes:
    • OpenAICompatibleClient translates image/audio/video canonical blocks to image_url / input_audio / audio_url / video_url.
    • AnthropicAdapter translates canonical image blocks and raises UnsupportedAnthropicMediaBlockError for audio/video, surfaced as ProviderError.unsupported_capability.
  • A discriminated union MultiModalContextT typed on LLMTextColumnConfig/ImageColumnConfig (and subclasses), with a mode="before" validator that injects modality: image for legacy dicts so old configs keep deserializing.
  • Public re-exports of AudioContext, VideoContext, AudioFormat, VideoFormat, MultiModalContextT via data_designer.config.
  • Doc updates (default settings, model-configs, tutorial intro) calling out that the selected model must support every modality being sent. The default nvidia-vision/openai-vision/openrouter-vision doc rows were also refreshed.

The code respects the layering rule (engine→config only; new adapter imports of data_designer.config.models and data_designer.config.utils.media_helpers go in the allowed direction), uses from __future__ import annotations everywhere, and adds focused tests for every new branch.

Findings

Backward compatibility

  • ✅ The legacy image-context dict shape (no modality key) is preserved by inject_legacy_image_context_modality validators on both LLMTextColumnConfig and ImageColumnConfig. A round-trip test in test_columns.py verifies this.
  • ⚠️ ImageContext.modality was tightened from Modality to Literal[Modality.IMAGE] so it can serve as the union discriminator. Existing pickled/persisted ImageContext instances that were constructed with the wider type still validate (the only allowed value was always IMAGE), but any third-party code that constructs ImageContext(modality=Modality.AUDIO) would now fail. Worth a release-note line.
  • ⚠️ The on-the-wire shape of multimodal context emitted by ImageContext.get_contexts() changed from {"type": "image_url", "image_url": {...}} to the canonical {"type": "image", "source": {...}}. Internal call sites are updated, but if any user code or plugin currently consumes the return of get_contexts() directly (e.g. custom column generators that build their own messages), it will now see a different schema. The PR description says "preserving legacy image-context config compatibility" — that is true at the config layer but not at the runtime payload layer. Worth a sentence in the changelog/migration notes.

Correctness

  • OpenAICompatibleClient translates a canonical audio base64 block to OpenAI's input_audio (with format=mp3|wav) but URL audio to audio_url. audio_url/video_url are not part of the official OpenAI Chat Completions schema; they are only meaningful for "OpenAI-compatible" providers (NVIDIA Omni, OpenRouter, etc.) that have extended the spec. Sending audio_url to canonical OpenAI will be a 400. Given the doc note "modality support depends on the model," this seems intentional, but it is worth an inline comment in _translate_canonical_audio_block/_translate_canonical_video_block to make the non-standardness explicit for future maintainers. (packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/openai_compatible.py:222-250)
  • _AUDIO_FORMAT_TO_MIME_TYPE[WAV] = "audio/wav" while _AUDIO_MIME_TYPE_TO_FORMAT accepts five WAV aliases (audio/wav, audio/wave, audio/x-wav, audio/vnd.wave). The asymmetry is fine for inference, but it means a base64 WAV constructed from audio_format=WAV always stamps audio/wav — which is what OpenAI expects, so this is correct. Just flagging the asymmetry. (packages/data-designer-config/src/data_designer/config/utils/media_helpers.py:35-51)
  • normalize_media_context_values catches (json.JSONDecodeError, TypeError). TypeError is unreachable because the call is guarded by isinstance(raw_value, str). Minor; the existing image-context implementation had the same defensive catch, so this is a faithful port. Could be tightened to just json.JSONDecodeError. (packages/data-designer-config/src/data_designer/config/utils/media_helpers.py:74-91)
  • is_audio_path/is_video_path only do extension matching; they do not check file existence the way is_image_path does (image goes through _auto_resolve_context_value with base_path). This is intentional because audio/video do not support local-path resolution, but the semantic gap means AudioContext(...).get_contexts({"audio": "screen_recording.mp3"}) raises a "Local audio paths are not supported" error even when the string is not actually a path on disk — any value ending in .mp3 is treated as a path. That is the right call for safety (better to reject than silently send screen_recording.mp3 as base64 to a model), and the docstring on _has_path_extension ("looks like a local audio path") matches this heuristic. Worth surfacing in user-facing error messages: the current message says "provide an audio URL or base64 audio data" which is good.
  • AudioContext._build_context with data_type=None and a data: URI flows through _resolve_base64_parts correctly because is_media_url returns False for data: URIs (only http:///https:// match). Verified by test_audio_context_auto_detect_url_and_data_uri. Good.
  • The Anthropic translator's _UNSUPPORTED_MEDIA_BLOCK_MODALITIES includes both canonical (audio, video) and provider-specific (audio_url, input_audio, video_url, input_video) keys, so any audio/video block — regardless of who minted it — fails closed with a typed error. Good defensive design.
  • OpenAICompatibleClient translate_openai_compatible_content_block short-circuits on {"image_url", "input_audio", "text"} but not on audio_url/video_url. Those fall through to the final return block, which is the right outcome (pass through unchanged). Could be added to the allowlist for symmetry/clarity. (openai_compatible.py:197)

Tests

  • Strong test coverage for every new branch: format validation (positive + negative), data-URI mismatch, local-path rejection, JSON/numpy/list/string normalization, mixed-modality round-trip in test_image_cell_generator_with_mixed_media_context, Anthropic rejection for audio/video (canonical and provider-specific block types).
  • Anthropic now refuses canonical audio/video at the adapter boundary; both test_completion_rejects_audio_video_context_as_unsupported (integration-ish) and test_translate_content_blocks_rejects_unsupported_media (unit) assert this.
  • Round-trip through model_dump() validated for the discriminated union (test_multi_modal_context_round_trips_discriminated_union).
  • ⚠️ The PR description notes make test was not run and only a focused suite was executed. The full test suite should be run (or rely on CI) before merge — the canonical-block change to ImageContext.get_contexts is potentially load-bearing for tests outside the focused list.

Style / conventions

  • All new files have SPDX headers, from __future__ import annotations, modern type syntax (list[…], str | None), and absolute imports. Consistent with the style guide.
  • Docstrings on AudioContext/VideoContext correctly call out the deliberate omission of local-path resolution.
  • _translate_canonical_image_block is duplicated between openai_compatible.py (private) and anthropic_translation.py (public translate_canonical_image_block). The two implementations diverge intentionally — OpenAI emits image_url data URIs, Anthropic emits canonical-style image blocks — so consolidation is not warranted. Fine as-is.
  • Minor: _get_media_source is an internal helper duplicated only on the OpenAI side. The Anthropic-side translate_canonical_image_block inlines the same isinstance(source, dict) check. Tiny duplication, not worth a refactor.

Docs

  • Tutorial / Fern updates clearly state "model must support every modality sent." Good.
  • The default-model-settings table refresh (vision → omni model) is documentation-only — constants.py already used the omni model, so the docs are catching up to code rather than introducing a new default. Worth verifying no users still expected nvidia/nemotron-nano-12b-v2-vl as the default; the prior PR that flipped the constant should already have signaled that.

Risks / suggestions

  1. Wire-format change for ImageContext.get_contexts() — call out in the changelog or migration notes that the dict shape changed (image_url → canonical image/source block). Internal callers are migrated; external plugin authors may not be.
  2. Run the full test suite before merge given the broad reach.
  3. Add an inline comment in _translate_canonical_audio_block/_translate_canonical_video_block noting that audio_url/video_url are non-standard OpenAI extensions used by capable providers; this avoids future confusion.
  4. (Optional) Add audio_url/video_url to the OpenAI passthrough allowlist for symmetry with input_audio.
  5. (Optional) Drop the unreachable TypeError from normalize_media_context_values.

Structural Impact

No structural impact analysis was provided for this PR (no /tmp/structural-impact-701.md file was found). The change touches three packages but respects the documented import direction (interface → engine → config); no reverse imports were introduced.

Verdict

Solid feature work. The canonical-block abstraction is a clean replacement for the ad-hoc image-context shape and pays dividends as the second/third modality lands. Discriminated unions, format validation, and adapter-side rejection of unsupported modalities are all in the right places, and test coverage is thorough.

Recommend approve with minor follow-ups (run full test suite, add a changelog note about the runtime payload shape change, optionally tighten the comments around non-standard audio_url/video_url blocks). Nothing blocking.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR introduces first-class AudioContext and VideoContext config models alongside the existing ImageContext, using a shared canonical media block format ({"type": modality, "source": {...}}) that each provider adapter translates into its native wire format. A MultiModalContextT discriminated union replaces the old list[ImageContext] fields on LLM/image column configs.

  • Canonical block design: get_media_url_context / get_media_base64_context helpers in the new media_helpers module produce provider-neutral blocks; the OpenAI-compatible adapter translates them to image_url / input_audio / video_url, while the Anthropic adapter translates canonical image blocks and raises UnsupportedAnthropicMediaBlockError for audio/video.
  • Legacy compatibility: A _inject_legacy_image_context_modality field validator on LLMTextColumnConfig and ImageColumnConfig preserves modality-less image-context dicts (now emitting a DeprecationWarning), while correctly leaving audio/video-specific dicts for the discriminated union to reject.
  • image_helpers.py consolidation: The old helper module is replaced by media_helpers.py; all internal callers and the public __init__.py lazy-import map are updated.

Confidence Score: 5/5

Safe to merge. The canonical media block design is consistent across all three modalities and both adapters, and the previous-thread concerns about legacy injection and Anthropic rejection are fully addressed.

The core translation pipeline — canonical block production in *Context.get_contexts(), OpenAI-compatible translation, and Anthropic rejection — is implemented correctly and symmetrically for all three modalities. The legacy image-context injection predicate correctly distinguishes image-only dicts from audio/video-specific dicts. The UnsupportedAnthropicMediaBlockError catch is ordered before the generic ValueError handler, which is important since the former is a subclass of the latter. No logic errors or correctness issues were found.

No files require special attention. The changes across adapters, config models, and helpers are internally consistent.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/utils/media_helpers.py New unified media helper module consolidating image utilities with new audio/video enums, MIME-type maps, path-detection helpers, and canonical block builders. Logic is well-factored and consistent across all three modalities.
packages/data-designer-config/src/data_designer/config/models.py Adds AudioContext and VideoContext ModalityContext subclasses with URL/base64 resolution logic, format validation, and a MultiModalContextT discriminated union. AudioContext and VideoContext intentionally do not resolve local files to base64 (only ImageContext does).
packages/data-designer-config/src/data_designer/config/column_configs.py multi_modal_context fields widened to list[MultiModalContextT]; legacy image-context dicts without a modality key are migrated with a DeprecationWarning unless audio/video-specific keys are present, which correctly defers them to the discriminated union validator.
packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/openai_compatible.py New translate_openai_compatible_messages layer correctly handles canonical image→image_url, audio→input_audio (base64) or audio_url (URL), and video→video_url in all four send paths. Unknown block types and provider-specific block types are passed through unchanged.
packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/anthropic_translation.py Adds translate_canonical_image_block for new canonical image blocks and raises UnsupportedAnthropicMediaBlockError for all audio/video block types (both canonical and provider-specific variants). Ordering of the specific exception catch before the generic ValueError handler is correct.
packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/anthropic.py Catches UnsupportedAnthropicMediaBlockError before the generic ValueError handler and converts it to ProviderError.unsupported_capability with modality details. Ordering is correct since the specific exception is a ValueError subclass.
packages/data-designer-config/src/data_designer/config/init.py Public API updated to export AudioContext, VideoContext, AudioFormat, and VideoFormat; ImageFormat lazy-import path updated from image_helpers to media_helpers.
packages/data-designer-engine/tests/engine/models/clients/test_openai_compatible.py New tests cover canonical image/audio/video translation, passthrough of provider-specific block types, and ProviderError propagation for translation failures.
packages/data-designer-engine/tests/engine/models/clients/test_anthropic_translation.py Tests verify canonical image block translation and that all six audio/video block type variants raise UnsupportedAnthropicMediaBlockError.

Sequence Diagram

sequenceDiagram
    participant User
    participant ContextModel as AudioContext / VideoContext / ImageContext
    participant Generator as ColumnGenerator
    participant OpenAI as OpenAICompatibleClient
    participant Anthropic as AnthropicClient

    User->>ContextModel: configure multi_modal_context
    Generator->>ContextModel: get_contexts(record, base_path)
    ContextModel-->>Generator: "canonical blocks {type: audio|video|image, source: {...}}"

    Generator->>OpenAI: ChatCompletionRequest(messages with canonical blocks)
    OpenAI->>OpenAI: translate_openai_compatible_messages()
    Note over OpenAI: image → image_url<br/>audio base64 → input_audio<br/>audio url → audio_url<br/>video → video_url
    OpenAI-->>User: API response

    Generator->>Anthropic: ChatCompletionRequest(messages with canonical blocks)
    Anthropic->>Anthropic: build_anthropic_payload() → translate_content_blocks()
    Note over Anthropic: image_url → Anthropic image<br/>canonical image → Anthropic image<br/>audio/video → UnsupportedAnthropicMediaBlockError
    Anthropic-->>User: ProviderError.unsupported_capability (audio/video)
Loading

Reviews (8): Last reviewed commit: "align media local path autodetection" | Re-trigger Greptile

Comment thread packages/data-designer-config/src/data_designer/config/column_configs.py Outdated
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this one, @nabinchha — the canonical media helper direction is a clean way to keep provider shapes out of config.

Summary

PR 701 adds first-class AudioContext and VideoContext alongside the existing ImageContext, migrates config-produced media blocks to a canonical shape, and teaches the OpenAI-compatible and Anthropic adapters how to translate or reject those blocks. This mostly matches issue #671 and the #668 plan, but I found a provider capability edge and a legacy-config ambiguity worth tightening before merge.

Findings

Warnings — Worth addressing

packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/openai_compatible.py:225 — Audio/video URL and video blocks are sent without capability gating

  • What: _translate_canonical_audio_block() maps every URL audio source to {"type": "audio_url", ...}, and _translate_canonical_video_block() maps every URL/base64 video source to {"type": "video_url", ...} for all OpenAI-compatible providers/models. There is no provider/model/source-type gate before transport.
  • Why: Issue #671 asks unsupported providers/routes to raise canonical unsupported-capability errors, and the #668 plan calls out checking modality, source type, and media type before sending. OpenAI Chat Completions defines base64 audio via input_audio, while URL audio and video_url are not generally supported across OpenAI-compatible endpoints. With the current code, a user can configure AudioContext(data_type=URL) or any VideoContext against the default OpenAI provider and get a provider 400 after the HTTP request instead of a ProviderErrorKind.UNSUPPORTED_CAPABILITY before transport.
  • Suggestion: Add an adapter-level media capability policy before translating/sending blocks. For example, allow base64 audio -> input_audio where supported, allow audio_url/video_url only for known provider/model routes that accept those parts, and otherwise raise ProviderError.unsupported_capability(...). It would be good to add tests that assert unsupported URL-audio/video cases do not call post().

packages/data-designer-config/src/data_designer/config/column_configs.py:189 — Modality-less audio/video URL dicts silently become ImageContext

  • What: The legacy migration treats any dict without modality as an image context. That preserves old image configs, but a new dict config like {"column_name": "audio_url", "data_type": "url"} is accepted as ImageContext(column_name="audio_url", data_type="url") instead of failing as a missing-modality audio context.
  • Why: This is easy to miss because URL-backed audio/video contexts do not need audio_format or video_format, so there is no extra field to force validation to fail. The result is that an audio/video URL can be sent through the image path as image_url, which is a confusing runtime failure and diverges from the plan’s note that new audio/video dict configs must declare modality explicitly.
  • Suggestion: Could we make this ambiguity explicit? Options include adding a caller warning whenever a modality-less dict is migrated to image, documenting that modality-less dicts are always legacy image configs, or introducing a stricter/deprecated migration path. At minimum, I’d add a regression test for modality-less URL audio/video dicts so the chosen behavior is intentional.

Suggestions — Take it or leave it

packages/data-designer-config/src/data_designer/config/models.py:150 — Validate image data URI format against image_format

  • What: Audio and video data URIs raise when the configured format conflicts with the URI media type, but image data URIs currently bypass that check and trust the URI media type even when image_format is set differently.
  • Why: It is a small consistency gap in the new shared media treatment, and mismatched config can be hard to diagnose once it reaches a provider.
  • Suggestion: Mirror the audio/video behavior: infer the image format from the data URI media type and raise if it conflicts with self.image_format.

What Looks Good

  • The canonical {"type": modality, "source": ...} block shape keeps config provider-neutral while preserving the existing image behavior after adapter translation.
  • The Anthropic path is careful about continuing to accept legacy image_url blocks while raising a canonical unsupported-capability error for audio/video.
  • The focused tests cover scalar/list/JSON/array normalization, local-path rejection for audio/video, mixed-context ordering, and image path resolution, which are exactly the edge cases most likely to regress.

Verdict

Needs changes — I’d address the OpenAI-compatible capability gate before merge, and decide intentionally how strict the legacy modality migration should be for new audio/video dict configs.


This review was generated by an AI assistant.

Comment thread packages/data-designer-config/src/data_designer/config/column_configs.py Outdated
Comment thread packages/data-designer-config/src/data_designer/config/models.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

MkDocs preview: https://22e174d7.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-701.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks, Nabin. I’m good with the resolution on the original review threads:

  • The image data URI / image_format mismatch is addressed cleanly.
  • The legacy modality-less dict path is now explicit via DeprecationWarning, and the remaining URL-only ambiguity is unavoidable without breaking old image configs.
  • I buy the rationale for not adding static OpenAI-compatible preflight gates. Since custom OpenAI-compatible providers can point at very different backends, preserving provider 400 messages is probably a better source of truth than guessing locally.

One thing I’d still like to resolve before merge: the latest branch now supports local audio/video paths by passing them through unchanged. That seems outside issue #671’s stated scope, which explicitly excluded local path handling for new audio/video contexts. It also creates a different behavior from image context, where local paths are resolved against base_path and converted to base64 before transport.

Could we either:

  1. revert local audio/video path support and keep v1 to URL/base64 only, or
  2. explicitly update the PR scope/docs/tests to say local audio/video paths are supported only for colocated provider routes, and make clear that these are passed through rather than resolved like images?

I’m fine with the no-preflight OpenAI-compatible decision after that clarification, but I’d treat the local-path scope change as the remaining merge blocker.

@nabinchha
Copy link
Copy Markdown
Contributor Author

nabinchha commented May 22, 2026

@johnnygreco update/correction after checking the exact ImageContext behavior and tightening the implementation: I updated #671 again so local audio/video path pass-through is scoped to explicit URL mode only (data_type=url).

The branch now mirrors image auto-detect semantics more closely: in auto mode, local-looking audio/video paths are not implicitly passed through. They are rejected with guidance to use data_type=url, similar to how unresolved local-looking image paths in auto mode fall through and fail as invalid base64. Explicit URL mode still passes local/file-style values through for colocated endpoints that can read the same filesystem path.

So the boundary is now:

  • auto mode: HTTP(S) URLs and base64/data URI values only; local-looking audio/video paths are rejected
  • explicit URL mode: URL-style provider inputs, including local paths or file:// URIs, pass through unchanged
  • no base_path lookup, file read, base64 conversion, or upload/file-ID lifecycle for audio/video

Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nabinchha thanks for tightening this up. I rechecked f482dfb7 and the implementation now matches the boundary you described and the updated #671 scope:

  • auto mode only auto-detects HTTP(S) media URLs and base64/data URI values
  • local-looking audio/video paths are rejected in auto mode with guidance to use data_type=url
  • explicit data_type=url still passes local paths / file:// values through unchanged for colocated endpoints
  • audio/video still do not do base_path lookup, file reads, base64 conversion, or upload/file-ID lifecycle

I also reran the focused config model suite locally: packages/data-designer-config/tests/config/test_models.py passed (56 passed). My original review concerns are resolved. Approving from code review; merge should still wait for the remaining CI jobs to finish green.

@nabinchha nabinchha merged commit 52d42fe into main May 22, 2026
52 checks passed
@nabinchha nabinchha deleted the nmulepati/feat-671-audio-video-context branch May 22, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement audio and video context support

2 participants