Skip to content

Conversation

@gustavocidornelas
Copy link
Contributor

Pull Request

Summary

Introduces:

  1. Multimodal trace support for OpenAI chat completion and responses API (image, audio, files).
  2. "Attachment" support for every trace step.

Changes

Attachments

  • Introduces the Attachment abstraction, which represents media data uploaded to a blob store. The Attachment object stores metadata and the storageUri, which can be used by the Openlayer platform to fetch the media.
  • Every step can have an attachments field, which is an array of Attachment objects. This allows users to log arbitrary media to a step. For example:
from openlayer.lib import trace
from openlayer.lib.tracing import log_attachment

@trace()
def my_function():
    # Do something

    # Log attachment to the `my_function` step
    log_attachment("/path/to/file")
    return
  • When streaming the trace to the Openlayer step, we now scan the trace to check if there are any attachments. If there are, they are uploaded first and then the trace is streamed. The upload happens via the typical presigned URL flow.

OpenAI multimodal

  • Besides having attachments, this PR also instruments the trace_openai wrapper to parse image/audio/files in the input or output of OpenAI LLM calls.
  • To support media in OpenAI traces, we extend how inputs/outputs are represented. In summary, the schema is:
  # -------------- Inputs --------------------
  {
      "prompt": [
          # Old format. We still use it when the prompt only has strings
          {
              "role": "user",
              "content": "Simple text message"  # String for text-only (backwards compatibility)
          },
           # New format. Content as a list of objects with `type` (one of `text`, `image`, `audio`, or `file`)
          {
              "role": "user",
              "content": [  # List for multimodal
                  {"type": "text", "text": "What's in this image?"},
                  {"type": "image", "attachment": {"id": "...", "storageUri": "...", ...}}
              ]
          }
      ]
  }

# -------------- Output --------------------
# Old format. We still use it if the output is a simple string
  "Simple text response"  # String for text-only (backwards compatibility)

  # or
  {"type": "text", "text": "Text response"}

  # or
  {"type": "audio", "attachment": {"id": "...", "storageUri": "...", ...}}

  # or mixed
  [
      {"type": "text", "text": "Here's the image you requested:"},
      {"type": "image", "attachment": {...}}
  ]

Note that if the type is one of image, audio, or file, the other object field is an attachment, which is a serialized Attachment object.

Context

OPEN-8683: Multimodal attachment support for the Python SDK and OPEN-8684: Enhance OpenAI tracer to support multimodal inputs/outputs

Testing

  • Manual testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants