Skip to content

Commit e557fc3

Browse files
lambdabaaari-bmgf
authored andcommitted
Add Dewey integration
1 parent 8791e3e commit e557fc3

2 files changed

Lines changed: 142 additions & 0 deletions

File tree

integrations/dewey-haystack.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
---
2+
layout: integration
3+
name: Dewey
4+
description: Connect Haystack pipelines to Dewey — a managed document intelligence backend that handles PDF conversion, chunking, embedding, and hybrid retrieval behind a single API.
5+
authors:
6+
- name: Dewey
7+
socials:
8+
github: meetdewey
9+
pypi: https://pypi.org/project/dewey-haystack
10+
repo: https://github.com/meetdewey/dewey-haystack
11+
type: Document Store
12+
report_issue: https://github.com/meetdewey/dewey-haystack/issues
13+
logo: /logos/dewey.png
14+
version: Haystack 2.0
15+
toc: true
16+
---
17+
18+
### **Table of Contents**
19+
- [Overview](#overview)
20+
- [Installation](#installation)
21+
- [Usage](#usage)
22+
- [License](#license)
23+
24+
## Overview
25+
26+
[Dewey](https://meetdewey.com) is a managed document intelligence backend for AI applications. Upload PDFs, Word docs, and other files — Dewey handles conversion, section extraction, chunking, embedding, and hybrid semantic + BM25 retrieval automatically.
27+
28+
This integration provides three Haystack 2.0 components:
29+
30+
- **`DeweyDocumentStore`** — implements the Haystack `DocumentStore` protocol, backed by a Dewey collection
31+
- **`DeweyRetriever`** — a `@component` that runs hybrid search against a collection and returns ranked `Document` objects
32+
- **`DeweyResearchComponent`** — a `@component` that runs Dewey's full agentic research loop (multi-step search, synthesis, citations) and returns a grounded Markdown answer
33+
34+
## Installation
35+
36+
```bash
37+
pip install dewey-haystack
38+
```
39+
40+
Requires a free Dewey account at [meetdewey.com](https://meetdewey.com). Set your API key:
41+
42+
```bash
43+
export DEWEY_API_KEY="dwy_live_..."
44+
```
45+
46+
## Usage
47+
48+
### Components
49+
50+
This integration introduces three components:
51+
52+
- **`DeweyDocumentStore`** (`haystack_integrations.document_stores.dewey`)
53+
- **`DeweyRetriever`** (`haystack_integrations.components.retrievers.dewey`)
54+
- **`DeweyResearchComponent`** (`haystack_integrations.components.retrievers.dewey`)
55+
56+
### RAG pipeline with DeweyRetriever
57+
58+
```python
59+
import os
60+
from haystack import Pipeline
61+
from haystack_integrations.document_stores.dewey import DeweyDocumentStore
62+
from haystack_integrations.components.retrievers.dewey import DeweyRetriever
63+
from haystack.components.builders import PromptBuilder
64+
from haystack.components.generators import OpenAIGenerator
65+
from haystack.utils import Secret
66+
67+
store = DeweyDocumentStore(
68+
api_key=Secret.from_env_var("DEWEY_API_KEY"),
69+
collection_id="3f7a1b2c-...", # your collection ID
70+
)
71+
72+
prompt_template = """
73+
Answer the question using only the provided context.
74+
Context: {% for doc in documents %}{{ doc.content }}{% endfor %}
75+
Question: {{ query }}
76+
"""
77+
78+
pipeline = Pipeline()
79+
pipeline.add_component("retriever", DeweyRetriever(document_store=store, top_k=5))
80+
pipeline.add_component("prompt", PromptBuilder(template=prompt_template))
81+
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
82+
83+
pipeline.connect("retriever.documents", "prompt.documents")
84+
pipeline.connect("prompt.prompt", "llm.prompt")
85+
86+
result = pipeline.run({
87+
"retriever": {"query": "What are the key findings?"},
88+
"prompt": {"query": "What are the key findings?"},
89+
})
90+
print(result["llm"]["replies"][0])
91+
```
92+
93+
### Agentic research with DeweyResearchComponent
94+
95+
`DeweyResearchComponent` is a drop-in replacement for an LLM generator when you want Dewey to handle both retrieval *and* generation. It runs a multi-step research loop internally and returns a grounded answer with cited sources.
96+
97+
```python
98+
import os
99+
from haystack import Pipeline
100+
from haystack_integrations.components.retrievers.dewey import DeweyResearchComponent
101+
from haystack.utils import Secret
102+
103+
pipeline = Pipeline()
104+
pipeline.add_component(
105+
"research",
106+
DeweyResearchComponent(
107+
api_key=Secret.from_env_var("DEWEY_API_KEY"),
108+
collection_id="3f7a1b2c-...",
109+
depth="balanced", # "quick" | "balanced" | "deep" | "exhaustive"
110+
),
111+
)
112+
113+
result = pipeline.run({"research": {"query": "What were the key findings across all studies?"}})
114+
print(result["research"]["answer"])
115+
116+
for source in result["research"]["sources"]:
117+
print(f" [{source.meta['filename']}] {source.content[:80]}...")
118+
```
119+
120+
### Writing documents
121+
122+
Upload content to Dewey directly from a Haystack pipeline using `DeweyDocumentStore.write_documents`:
123+
124+
```python
125+
from haystack import Document
126+
from haystack_integrations.document_stores.dewey import DeweyDocumentStore
127+
from haystack.utils import Secret
128+
129+
store = DeweyDocumentStore(
130+
api_key=Secret.from_env_var("DEWEY_API_KEY"),
131+
collection_id="3f7a1b2c-...",
132+
)
133+
134+
store.write_documents([
135+
Document(content="Neural networks learn via backpropagation.", meta={"source": "ml-intro.txt"}),
136+
Document(content="Transformers use self-attention mechanisms.", meta={"source": "transformers.txt"}),
137+
])
138+
```
139+
140+
## License
141+
142+
`dewey-haystack` is released under the [MIT License](https://github.com/meetdewey/dewey-haystack/blob/main/LICENSE).

logos/dewey.png

14.9 KB
Loading

0 commit comments

Comments
 (0)