Context
The packages/extract/src/index.ts is a 386-line monolith containing:
- Zod schemas
- MIME type detection
- PDF to image conversion
- OCR processing
- Gemini provider
- Ollama provider
- Streaming logic
This makes unit testing difficult - we cannot test OCR, PDF conversion, or providers in isolation.
Architecture Decisions
- Package:
packages/extract
- Pattern: Module decomposition with barrel export
- Goal: Each module testable independently
Proposed Structure
packages/extract/src/
├── index.ts # Barrel export only
├── schemas.ts # Zod schemas
├── mime.ts # getMimeType()
├── pdf.ts # pdfToImages()
├── ocr.ts # ocrImages() + types
├── extract.ts # extractDocument() orchestrator
├── providers/
│ ├── gemini.ts # extractWithGemini()
│ └── ollama.ts # extractWithOllama()
└── types.ts # StreamChunk, StreamCallback, ExtractOptions
Requirements
Success Criteria
References
Context
The
packages/extract/src/index.tsis a 386-line monolith containing:This makes unit testing difficult - we cannot test OCR, PDF conversion, or providers in isolation.
Architecture Decisions
packages/extractProposed Structure
Requirements
schemas.tsmime.tspdf.tsocr.tsproviders/gemini.tsproviders/ollama.tstypes.tsindex.tsSuccess Criteria
References