|
| 1 | +--- |
| 2 | +layout: integration |
| 3 | +name: Dewey |
| 4 | +description: Connect Haystack pipelines to Dewey — a managed document intelligence backend that handles PDF conversion, chunking, embedding, and hybrid retrieval behind a single API. |
| 5 | +authors: |
| 6 | + - name: Dewey |
| 7 | + socials: |
| 8 | + github: meetdewey |
| 9 | +pypi: https://pypi.org/project/dewey-haystack |
| 10 | +repo: https://github.com/meetdewey/dewey-haystack |
| 11 | +type: Document Store |
| 12 | +report_issue: https://github.com/meetdewey/dewey-haystack/issues |
| 13 | +logo: /logos/dewey.png |
| 14 | +version: Haystack 2.0 |
| 15 | +toc: true |
| 16 | +--- |
| 17 | + |
| 18 | +### **Table of Contents** |
| 19 | +- [Overview](#overview) |
| 20 | +- [Installation](#installation) |
| 21 | +- [Usage](#usage) |
| 22 | +- [License](#license) |
| 23 | + |
| 24 | +## Overview |
| 25 | + |
| 26 | +[Dewey](https://meetdewey.com) is a managed document intelligence backend for AI applications. Upload PDFs, Word docs, and other files — Dewey handles conversion, section extraction, chunking, embedding, and hybrid semantic + BM25 retrieval automatically. |
| 27 | + |
| 28 | +This integration provides three Haystack 2.0 components: |
| 29 | + |
| 30 | +- **`DeweyDocumentStore`** — implements the Haystack `DocumentStore` protocol, backed by a Dewey collection |
| 31 | +- **`DeweyRetriever`** — a `@component` that runs hybrid search against a collection and returns ranked `Document` objects |
| 32 | +- **`DeweyResearchComponent`** — a `@component` that runs Dewey's full agentic research loop (multi-step search, synthesis, citations) and returns a grounded Markdown answer |
| 33 | + |
| 34 | +## Installation |
| 35 | + |
| 36 | +```bash |
| 37 | +pip install dewey-haystack |
| 38 | +``` |
| 39 | + |
| 40 | +Requires a free Dewey account at [meetdewey.com](https://meetdewey.com). Set your API key: |
| 41 | + |
| 42 | +```bash |
| 43 | +export DEWEY_API_KEY="dwy_live_..." |
| 44 | +``` |
| 45 | + |
| 46 | +## Usage |
| 47 | + |
| 48 | +### Components |
| 49 | + |
| 50 | +This integration introduces three components: |
| 51 | + |
| 52 | +- **`DeweyDocumentStore`** (`haystack_integrations.document_stores.dewey`) |
| 53 | +- **`DeweyRetriever`** (`haystack_integrations.components.retrievers.dewey`) |
| 54 | +- **`DeweyResearchComponent`** (`haystack_integrations.components.retrievers.dewey`) |
| 55 | + |
| 56 | +### RAG pipeline with DeweyRetriever |
| 57 | + |
| 58 | +```python |
| 59 | +import os |
| 60 | +from haystack import Pipeline |
| 61 | +from haystack_integrations.document_stores.dewey import DeweyDocumentStore |
| 62 | +from haystack_integrations.components.retrievers.dewey import DeweyRetriever |
| 63 | +from haystack.components.builders import PromptBuilder |
| 64 | +from haystack.components.generators import OpenAIGenerator |
| 65 | +from haystack.utils import Secret |
| 66 | + |
| 67 | +store = DeweyDocumentStore( |
| 68 | + api_key=Secret.from_env_var("DEWEY_API_KEY"), |
| 69 | + collection_id="3f7a1b2c-...", # your collection ID |
| 70 | +) |
| 71 | + |
| 72 | +prompt_template = """ |
| 73 | +Answer the question using only the provided context. |
| 74 | +Context: {% for doc in documents %}{{ doc.content }}{% endfor %} |
| 75 | +Question: {{ query }} |
| 76 | +""" |
| 77 | + |
| 78 | +pipeline = Pipeline() |
| 79 | +pipeline.add_component("retriever", DeweyRetriever(document_store=store, top_k=5)) |
| 80 | +pipeline.add_component("prompt", PromptBuilder(template=prompt_template)) |
| 81 | +pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini")) |
| 82 | + |
| 83 | +pipeline.connect("retriever.documents", "prompt.documents") |
| 84 | +pipeline.connect("prompt.prompt", "llm.prompt") |
| 85 | + |
| 86 | +result = pipeline.run({ |
| 87 | + "retriever": {"query": "What are the key findings?"}, |
| 88 | + "prompt": {"query": "What are the key findings?"}, |
| 89 | +}) |
| 90 | +print(result["llm"]["replies"][0]) |
| 91 | +``` |
| 92 | + |
| 93 | +### Agentic research with DeweyResearchComponent |
| 94 | + |
| 95 | +`DeweyResearchComponent` is a drop-in replacement for an LLM generator when you want Dewey to handle both retrieval *and* generation. It runs a multi-step research loop internally and returns a grounded answer with cited sources. |
| 96 | + |
| 97 | +```python |
| 98 | +import os |
| 99 | +from haystack import Pipeline |
| 100 | +from haystack_integrations.components.retrievers.dewey import DeweyResearchComponent |
| 101 | +from haystack.utils import Secret |
| 102 | + |
| 103 | +pipeline = Pipeline() |
| 104 | +pipeline.add_component( |
| 105 | + "research", |
| 106 | + DeweyResearchComponent( |
| 107 | + api_key=Secret.from_env_var("DEWEY_API_KEY"), |
| 108 | + collection_id="3f7a1b2c-...", |
| 109 | + depth="balanced", # "quick" | "balanced" | "deep" | "exhaustive" |
| 110 | + ), |
| 111 | +) |
| 112 | + |
| 113 | +result = pipeline.run({"research": {"query": "What were the key findings across all studies?"}}) |
| 114 | +print(result["research"]["answer"]) |
| 115 | + |
| 116 | +for source in result["research"]["sources"]: |
| 117 | + print(f" [{source.meta['filename']}] {source.content[:80]}...") |
| 118 | +``` |
| 119 | + |
| 120 | +### Writing documents |
| 121 | + |
| 122 | +Upload content to Dewey directly from a Haystack pipeline using `DeweyDocumentStore.write_documents`: |
| 123 | + |
| 124 | +```python |
| 125 | +from haystack import Document |
| 126 | +from haystack_integrations.document_stores.dewey import DeweyDocumentStore |
| 127 | +from haystack.utils import Secret |
| 128 | + |
| 129 | +store = DeweyDocumentStore( |
| 130 | + api_key=Secret.from_env_var("DEWEY_API_KEY"), |
| 131 | + collection_id="3f7a1b2c-...", |
| 132 | +) |
| 133 | + |
| 134 | +store.write_documents([ |
| 135 | + Document(content="Neural networks learn via backpropagation.", meta={"source": "ml-intro.txt"}), |
| 136 | + Document(content="Transformers use self-attention mechanisms.", meta={"source": "transformers.txt"}), |
| 137 | +]) |
| 138 | +``` |
| 139 | + |
| 140 | +## License |
| 141 | + |
| 142 | +`dewey-haystack` is released under the [MIT License](https://github.com/meetdewey/dewey-haystack/blob/main/LICENSE). |
0 commit comments