Skip to content

[Example] 490 — Haystack Audio Transcription Pipeline (Python)#190

Open
github-actions[bot] wants to merge 1 commit intomainfrom
example/490-haystack-deepgram-stt-pipeline-python
Open

[Example] 490 — Haystack Audio Transcription Pipeline (Python)#190
github-actions[bot] wants to merge 1 commit intomainfrom
example/490-haystack-deepgram-stt-pipeline-python

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 5, 2026

New example: Haystack Audio Transcription Pipeline (Python)

Integration: Haystack | Language: Python | Products: STT

What this shows

A custom Haystack 2.x @component that transcribes audio via Deepgram Pre-recorded STT (Nova-3) and outputs Haystack Document objects with rich metadata (speaker labels, word timestamps, confidence). Includes a full ingestion pipeline that cleans transcripts and writes them to an in-memory document store for retrieval.

Required secrets

None — only DEEPGRAM_API_KEY required

Tests

✅ Tests passed

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

Built by Engineer on 2026-04-05

@github-actions github-actions bot added type:example New example language:python Language: Python integration:haystack Integration: Haystack labels Apr 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.2, pluggy-1.6.0
collected 3 items

tests/test_example.py::test_transcriber_component PASSED                 [ 33%]
tests/test_example.py::test_batch_transcription PASSED                   [ 66%]
tests/test_example.py::test_ingest_pipeline PASSED                       [100%]

======================== 3 passed, 2 warnings in 6.69s =========================

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

Integration genuineness

Pass. Haystack 2.x SDK is imported and used throughout — Pipeline, @component, Document, DocumentCleaner, DocumentWriter, InMemoryDocumentStore. The DeepgramTranscriber is a proper Haystack @component with run() method and @component.output_types. The pipeline connects components via pipeline.connect() and runs via pipeline.run(). Haystack does not provide a built-in Deepgram component, so wrapping DeepgramClient inside a custom @component is the correct integration pattern. No raw WebSocket or HTTP calls — Deepgram SDK is used (client.listen.v1.media.transcribe_url).

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — matches required version
  • tag="deepgram-examples" present on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling for missing API key
  • ✅ Tests import from src/ and exercise the component and pipeline directly
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), not specific word lists
  • ✅ Credential check runs FIRST (top of test file, before src/ imports)
  • build_ingest_pipeline() tested end-to-end: transcribe → clean → write to document store

Documentation

  • ✅ README covers what you'll build, env vars with console links, install/run instructions, key parameters, and how it works
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions github-actions bot added the status:review-passed Self-review passed label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_transcriber_component PASSED
  ✓ DeepgramTranscriber component working
    Transcript length: 334 chars
    Duration: 25.4s
    Words: 62
    Speakers: 1

tests/test_example.py::test_batch_transcription PASSED
  ✓ Batch transcription working (2 documents)

tests/test_example.py::test_ingest_pipeline PASSED
  ✓ Ingest pipeline working (transcribe → clean → write)
    Documents in store: 1
    Transcript length: 334 chars

3 passed in 6.62s

Integration genuineness

Pass — Haystack SDK (haystack-ai) is imported and used throughout. The DeepgramTranscriber is a proper Haystack 2.x @component registered in a real Pipeline with DocumentCleaner and DocumentWriter. Haystack does not provide a native Deepgram wrapper, so wrapping DeepgramClient inside a custom component is the correct idiomatic pattern. No raw WebSocket or HTTP calls. No bypass detected.

Code quality

  • Official Deepgram SDK (deepgram-sdk==6.1.1) — correct version
  • tag="deepgram-examples" present on the API call
  • No hardcoded credentials
  • Error handling for missing DEEPGRAM_API_KEY
  • Tests import from src/ and call the example's actual code (DeepgramTranscriber, build_ingest_pipeline)
  • Transcript assertions use length/duration proportionality (chars_per_sec) — no word lists
  • Credential check runs before SDK imports in tests (exit 2)

Documentation

  • README includes "What you'll build", env vars table with console links, install/run instructions
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

3 passed, 2 warnings in 4.33s

Integration genuineness

✅ Pass — Haystack 2.x @component, Pipeline, DocumentCleaner, InMemoryDocumentStore, and DocumentWriter are all imported and used in a real pipeline (transcribe → clean → write). Haystack has no native Deepgram component, so wrapping DeepgramClient inside a custom @component is the correct integration pattern. No raw WebSocket or HTTP calls. tag="deepgram-examples" present.

Code quality

  • ✅ Official deepgram-sdk==6.1.1 (matches required version)
  • tag="deepgram-examples" on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling: RuntimeError if DEEPGRAM_API_KEY not set
  • ✅ Tests import from src/ and call the example's actual DeepgramTranscriber and build_ingest_pipeline
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), no specific word lists
  • ✅ Credential check runs before SDK imports in test file

Documentation

  • ✅ README has "What you'll build", env vars with console links, install and run instructions
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

pytest: 3 passed, 2 warnings in 6.06s

Integration genuineness

✅ Pass — Haystack 2.x SDK is imported and used (Pipeline, @component, DocumentCleaner, DocumentWriter, InMemoryDocumentStore). The DeepgramTranscriber is a proper Haystack @component that produces Document objects with rich metadata. Deepgram SDK is used inside the component as expected (Haystack has no built-in Deepgram wrapper). No raw WebSocket/HTTP calls to Deepgram. No bypass.

Code quality

  • ✅ Official Deepgram SDK: deepgram-sdk==6.1.1 (matches required version)
  • tag="deepgram-examples" present on API call
  • ✅ No hardcoded credentials — reads from DEEPGRAM_API_KEY env var
  • ✅ Error handling: raises RuntimeError if API key missing
  • ✅ Tests import from src/ and test actual component + pipeline code
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), no word-list checks
  • ✅ Credential check runs first (before SDK imports) with sys.exit(2) on missing creds
  • haystack-ai==2.27.0 pinned

Documentation

  • ✅ README: clear "what you'll build", env vars table with console link, install/run instructions
  • .env.example present and complete
  • ✅ Key parameters documented

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.2, pluggy-1.6.0

tests/test_example.py::test_transcriber_component PASSED                 [ 33%]
tests/test_example.py::test_batch_transcription PASSED                   [ 66%]
tests/test_example.py::test_ingest_pipeline PASSED                       [100%]

======================== 3 passed, 2 warnings in 6.90s =========================

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

Integration genuineness

Pass — Haystack 2.x @component decorator, Pipeline, DocumentCleaner, DocumentWriter, and InMemoryDocumentStore are all genuinely used. The DeepgramTranscriber is a proper Haystack component with the @component decorator and run() method returning the Haystack component contract ({"documents": List[Document]}). The full ingest pipeline (transcribe → clean → write) is a real Haystack pipeline pattern. Haystack is a framework (no hosted API), so DeepgramClient usage inside a custom component is the correct integration approach — there is no Haystack-native Deepgram wrapper to bypass.

No raw WebSocket, fetch, or HTTP calls — all Deepgram contact goes through the official SDK.

Code quality

  • ✅ Official deepgram-sdk==6.1.1 (matches required version)
  • haystack-ai==2.27.0 pinned
  • tag="deepgram-examples" present on transcription call
  • ✅ No hardcoded credentials
  • ✅ Error handling: RuntimeError raised if DEEPGRAM_API_KEY not set
  • ✅ Credential check in tests runs before any SDK imports (exits with code 2)
  • ✅ Tests import from src/ and call the example's actual code (DeepgramTranscriber, build_ingest_pipeline)
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), not specific word lists
  • ✅ Pipeline test verifies the full transcribe → clean → write flow end-to-end

Documentation

  • ✅ README: "What you'll build" section, env vars table with console link, install/run instructions, key parameters, how-it-works explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_transcriber_component PASSED
  ✓ DeepgramTranscriber component working
    Transcript length: 334 chars
    Duration: 25.4s
    Words: 62
    Speakers: 1

tests/test_example.py::test_batch_transcription PASSED
  ✓ Batch transcription working (2 documents)

tests/test_example.py::test_ingest_pipeline PASSED
  ✓ Ingest pipeline working (transcribe → clean → write)
    Documents in store: 1
    Transcript length: 334 chars

3 passed in 9.20s

Integration genuineness

Pass — All 6 checks pass:

  1. ✅ Haystack SDK imported and used (haystack-ai, @component, Pipeline, Document)
  2. ✅ Real Haystack API calls — Pipeline.add_component(), Pipeline.connect(), Pipeline.run()
  3. .env.example lists DEEPGRAM_API_KEY (Haystack is open-source, no API key needed)
  4. ✅ Tests exit 2 on missing credentials before any SDK imports
  5. ✅ No bypass — Haystack has no built-in Deepgram interface; custom @component is the correct integration pattern
  6. ✅ No raw WebSocket/fetch — uses official DeepgramClient SDK

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1 — current required version)
  • tag="deepgram-examples" included on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling for missing API key
  • ✅ Tests import from src/ and exercise the actual component and pipeline
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2) — no word lists
  • ✅ Credential check runs before SDK imports in tests

Documentation

  • ✅ README includes "What you'll build", env vars with console links, install/run instructions
  • .env.example present and complete
  • ✅ Key parameters table is a nice touch

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 7, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

3 passed, 2 warnings in 6.75s (pytest)

Integration genuineness

✅ Pass — Haystack SDK is imported and used correctly. The DeepgramTranscriber is a proper Haystack 2.x @component with @component.output_types and a run() method returning the Haystack component contract. The full pipeline uses Pipeline.add_component(), Pipeline.connect(), DocumentCleaner, and DocumentWriter with InMemoryDocumentStore — real Haystack pipeline mechanics. Haystack does not provide its own Deepgram wrapper, so using DeepgramClient directly inside the custom component is correct.

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — correct pinned version
  • tag="deepgram-examples" present on the Deepgram API call
  • ✅ No hardcoded credentials; API key read from environment
  • ✅ Error handling: RuntimeError raised if DEEPGRAM_API_KEY not set
  • ✅ Tests import from src/ and test the actual component (DeepgramTranscriber) and pipeline (build_ingest_pipeline)
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), not specific word lists
  • ✅ Credential check runs first in tests (module-level check exits with code 2 before src/ imports)
  • .env.example present and complete

Documentation

  • ✅ README includes "What you'll build", prerequisites, env var table with link to Deepgram console, install/run instructions, key parameters, and how-it-works explanation

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-07

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 7, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

pytest: 3 passed, 2 warnings in 6.72s

Integration genuineness

Pass — Haystack @component and Pipeline are used genuinely. The DeepgramTranscriber is a real Haystack 2.x component with proper @component.output_types decorator and run() contract. Haystack does not provide a built-in Deepgram integration, so wrapping DeepgramClient inside a custom component is the correct pattern. The full ingest pipeline (transcribe → clean → write) exercises real Haystack pipeline orchestration. No raw WebSocket/HTTP calls to Deepgram.

Code quality

  • Official Deepgram SDK used (deepgram-sdk==6.1.1) — matches required version
  • tag="deepgram-examples" present on the API call
  • No hardcoded credentials
  • Credential check runs before SDK imports in tests (exit 2 on missing keys)
  • Tests import from src/ and test the actual component and pipeline — not a standalone DeepgramClient
  • Transcript assertions use length/duration proportionality (chars_per_sec > 2) — no brittle word lists
  • Error handling covers missing API key

Documentation

  • README includes "What you'll build", env vars with console links, install/run instructions, key parameters table, and how-it-works section
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-07

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 7, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_transcriber_component PASSED                 [ 33%]
tests/test_example.py::test_batch_transcription PASSED                   [ 66%]
tests/test_example.py::test_ingest_pipeline PASSED                       [100%]

======================== 3 passed, 2 warnings in 6.86s =========================

Integration genuineness

Pass — Haystack SDK (haystack-ai) is imported and used throughout. The DeepgramTranscriber is a proper Haystack 2.x @component wired into a real Pipeline with DocumentCleaner and DocumentWriter. Haystack is a local pipeline framework with no built-in Deepgram audio interface, so creating a custom component wrapping DeepgramClient is the correct integration pattern. All Deepgram API calls go through the official SDK with tag="deepgram-examples". No raw WebSocket/HTTP calls.

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — matches required version
  • tag="deepgram-examples" present on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling for missing API key
  • ✅ Tests import from src/ and exercise the example's actual DeepgramTranscriber component and build_ingest_pipeline
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2) — no brittle word lists
  • ✅ Credential check runs first (exit 2) before SDK imports in test file

Documentation

  • ✅ README includes "What you'll build", env vars table with console links, install/run instructions, key parameters, and architecture explanation
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-07

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 7, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

pytest: 3 passed in 25.62s

Integration genuineness

Pass. Haystack 2.x is a local pipeline framework with no hosted API key — the correct integration pattern is a custom @component, which this example implements properly. The DeepgramTranscriber component:

  • Uses Haystack's @component decorator and @component.output_types
  • Returns Haystack Document objects with rich metadata
  • Plugs into a real Haystack Pipeline with DocumentCleaner and DocumentWriter
  • Deepgram SDK (DeepgramClient) is used inside the component — appropriate since Haystack has no native Deepgram wrapper

Code quality

  • Official Deepgram SDK used (deepgram-sdk==6.1.1 — current required version)
  • tag="deepgram-examples" present on Deepgram API call
  • No hardcoded credentials
  • Error handling: RuntimeError raised if DEEPGRAM_API_KEY missing
  • Tests import from src/ and call the example's actual code
  • Pipeline test (test_ingest_pipeline) exercises the full transcribe → clean → write flow
  • Transcript assertions use length/duration proportionality (chars_per_sec > 2), not specific word lists
  • Credential check in tests runs before SDK imports (exit 2 on missing creds)
  • No raw WebSocket or HTTP calls to Deepgram

Documentation

  • README: "What you'll build" section present
  • All env vars listed with where-to-get links
  • Install and run instructions complete
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:haystack Integration: Haystack language:python Language: Python status:review-passed Self-review passed type:example New example

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants