Skip to content

feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109

Open
bedus-creation wants to merge 2 commits into
mainfrom
feat/ai-image-audio
Open

feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109
bedus-creation wants to merge 2 commits into
mainfrom
feat/ai-image-audio

Conversation

@bedus-creation

Copy link
Copy Markdown
Contributor

Summary

  • Image.of(prompt).generate() — text-to-image via OpenAI DALL-E 3; returns ImageResponse with raw PNG bytes
  • Image editing.attachments([Files.Image.fromPath(...)]) switches to DALL-E 2 image editing
  • Image modifiers.landscape() (1792×1024), .portrait() (1024×1792), .square(), .quality(), .model()
  • Audio.of(text).generate() — text-to-speech via OpenAI TTS; returns AudioResponse
  • Audio modifiers.female() (nova), .male() (onyx), .voice("shimmer"), .speed(), .format(), .model()
  • Files.Image — factory with .fromStorage(), .fromPath(), .fromUrl() for image attachment sources
  • Storage integrationImageResponse / AudioResponse expose async .store(), .storeAs(), .storePublicly(), .storePubliclyAs() backed by the Storage facade (with fallback to temp dir)

API examples

from fastapi_startkit.ai import Image, Audio, Files

# Text to image
image = await Image.of("A donut on a counter").generate()

# Image edit with attachments + size modifier
image = await (
    Image.of("Make this impressionist")
    .attachments([
        Files.Image.fromStorage("photo.jpg"),
        Files.Image.fromPath("/tmp/photo.jpg"),
        Files.Image.fromUrl("https://example.com/photo.jpg"),
    ])
    .landscape()
    .generate()
)

# Store image
path = await image.store()
path = await image.storeAs("result.png")
path = await image.storePublicly()
path = await image.storePubliclyAs("result.png")

# Text to speech
audio = await Audio.of("Hello world").generate()
audio = await Audio.of("Hello world").female().generate()
audio = await Audio.of("Hello world").male().generate()
audio = await Audio.of("Hello world").voice("nova").generate()

# Store audio
path = await audio.store()
path = await audio.storeAs("greeting.mp3")

Files changed

File Change
ai/files.py New — ImageAttachment, Files.Image factory
ai/image.py New — Image builder, ImageResponse
ai/audio.py New — Audio builder, AudioResponse
ai/__init__.py Updated — exports Image, Audio, Files, ImageAttachment, ImageResponse, AudioResponse
tests/ai/test_image.py New — 21 unit tests
tests/ai/test_audio.py New — 24 unit tests

Test plan

  • All 45 new unit tests pass (fully mocked, no real API calls)
  • Full AI test suite: 154 tests pass, 0 failures

🤖 Generated with Claude Code

Implements a Laravel-style fluent API for image generation (DALL-E 3),
image editing (DALL-E 2 with attachments), and text-to-speech (OpenAI TTS).

- `Image.of(prompt)` — text-to-image via DALL-E 3; `.landscape()`, `.portrait()`,
  `.square()`, `.quality()`, `.model()` modifiers; `.attachments([…])` switches
  to DALL-E 2 image editing
- `Audio.of(text)` — TTS via OpenAI; `.female()` / `.male()` voice shortcuts,
  `.voice()`, `.speed()`, `.format()`, `.model()` modifiers
- `Files.Image.fromStorage/fromPath/fromUrl` — image attachment factories
- `ImageResponse` / `AudioResponse` — async `.store()`, `.storeAs()`,
  `.storePublicly()`, `.storePubliclyAs()` backed by Storage facade
- 45 new unit tests (all mocked, no real API calls)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +164 to +174
def _generate_sync(self) -> AudioResponse:
client = OpenAI(api_key=self._resolve_api_key())
response = client.audio.speech.create(
model=self._model,
voice=self._voice,
input=self._text,
speed=self._speed,
response_format=self._response_format,
)
data = response.read()
return AudioResponse(data=data, fmt=self._response_format)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use langchain or other library so we can support multiple provider ?

Comment on lines +9 to +23
class ImageAttachment:
"""Represents an image file to attach to an Image editing request.

Instances are created via the :class:`Files.Image` factory, not directly::

attachment = Files.Image.fromPath("/tmp/photo.jpg")
attachment = Files.Image.fromStorage("photo.jpg")
attachment = Files.Image.fromUrl("https://example.com/photo.jpg")
"""

def __init__(
self,
data: bytes,
name: str = "",
media_type: str = "image/jpeg",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have Document class in AI, can't we use that ?

Comment on lines +190 to +200
def _create(self) -> ImageResponse:
"""Generate a new image from a text prompt."""
client = OpenAI(api_key=self._resolve_api_key())
params: dict = {
"model": self._model,
"prompt": self._prompt,
"size": self._size,
"n": self._n,
"response_format": "b64_json",
}
if self._model == "dall-e-3":

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use other package so we can support multiple provider ?

Comment thread fastapi_startkit/tests/ai/test_audio.py Outdated
assert isinstance(audio, Audio)
assert audio._text == "Hello world"


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write class based test ?

… class-based tests

Comment 1 — Document extended for binary image attachments:
- content field now accepts str | bytes
- from_path() auto-detects binary (UnicodeDecodeError fallback to rb mode)
- New async from_url() downloads bytes via httpx
- New async from_storage() reads binary via Storage facade (or direct path)
- New to_bytes() returns binary content regardless of how it was loaded
- files.py (ImageAttachment/Files) no longer exported; Document is the single type

Comment 2 — Multi-provider support for Image:
- New ai/image_providers.py: ImageGenerationProvider ABC,
  OpenAIImageProvider (AsyncOpenAI), StabilityImageProvider (stub)
- Image.generate() is now truly async via provider abstraction
- Provider resolved from AIConfig.image_provider (AI_IMAGE_PROVIDER env var)

Comment 3 — Multi-provider support for Audio:
- New ai/audio_providers.py: AudioSynthesisProvider ABC,
  OpenAIAudioProvider (AsyncOpenAI), ElevenLabsAudioProvider (stub)
- Audio.generate() is now truly async via provider abstraction
- Provider resolved from AIConfig.audio_provider (AI_AUDIO_PROVIDER env var)
- AIConfig gains image_provider and audio_provider fields

Comment 4 — Class-based tests:
- test_image.py: TestDocumentImageAttachment, TestImageBuilder,
  TestImageGeneration, TestImageResult
- test_audio.py: TestAudioBuilder, TestAudioGeneration, TestAudioResult

All 156 AI tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant