Status: Draft - Awaiting Decisions Last Updated: 2026-01-15
These decisions are essential to write any functional code. Without answers, engineers cannot proceed.
Decision Needed: What image formats will the process() method accept?
| Option | Pros | Cons | Recommendation |
|---|---|---|---|
ImageData only |
Simple, canvas-native | Requires caller to convert first | ⭐ Start here for MVP |
Blob + ImageData |
Flexible, handles file uploads | More adapter logic needed | Production upgrade |
ArrayBuffer + ImageData + Blob |
Maximum flexibility | Complex type guards needed | Overkill for MVP |
MVP Decision: ✅ ImageData only
Rationale:
- Canvas-native type, no conversion needed within OCR engine
- Forces clean separation: UI layer handles file upload → canvas → ImageData, OCR layer focuses purely on processing
- Tesseract.js and Transformers.js both accept ImageData natively
- Can easily extend interface later:
process(data: ImageData | Blob)without breaking existing code
Decision Needed: What does process() return?
| Option | Structure | Use Case | Recommendation |
|---|---|---|---|
| Plain text string | Promise<string> |
Simple text extraction | ⭐ MVP sufficient |
| Structured result | Promise<OCRResult> with { text, confidence, language } |
Quality metrics needed | Consider for MVP+ |
| Full bounding boxes | Promise<OCRResult> with { text, words: Array<{text, bbox, confidence}> } |
Complex layout analysis | Production feature |
MVP Decision: ✅ Plain text string (Promise<string>)
Rationale:
- Satisfies 90% of OCR use cases: "What text is in this image?"
- Zero serialization overhead for Web Worker communication
- Easier to test and debug
- Can upgrade to structured format later without breaking the interface contract (engines can internally return structured data, just extract
.textproperty for now)
Production Upgrade Path:
// When needed, change to:
interface OCRResult {
text: string;
confidence?: number; // 0-100 quality score
language?: string; // ISO 639-1 code (e.g., 'en')
}
// Return type becomes Promise<OCRResult>Decision Needed: Which specific OCR libraries are we implementing?
Primary Engine (Fast option):
- Tesseract.js (WASM port of Tesseract)
- Custom lightweight solution
- Other: ________________
Secondary Engine (High accuracy option):
- Transformers.js with TrOCR model (post-MVP)
- Custom ONNX model (specify which): ________________
- Other: ________________
MVP Decision: ✅ Start with Tesseract.js only
Rationale:
- Proven stability: 10M+ weekly npm downloads, battle-tested in production
- No WebGPU dependency: Pure WASM, works on all modern browsers including older Safari
- Comprehensive documentation: Extensive examples and community support
- Language support: 100+ languages pre-trained (English for MVP, easy to expand)
- Reasonable performance: 2-5 seconds for typical documents on modern hardware
- Small initial bundle: Core library ~2MB, English traineddata ~4MB (acceptable for web)
Implementation Details:
- Package:
tesseract.jsv5.x - Worker setup: Use built-in worker support (
createWorker()) - Model loading: English (
eng.traineddata) from CDN with IndexedDB caching - Initialization: Call
worker.loadLanguage('eng')thenworker.initialize('eng')
Post-MVP: Add Transformers.js (TrOCR) as second engine to validate Strategy Pattern. This will require WebGPU fallback logic.
Integration Analysis (researched 2026-01-15):
After evaluating modern alternatives, Tesseract.js remains the best MVP choice:
| Library | npm Package | Integration | Bundle Size | Status |
|---|---|---|---|---|
| Tesseract.js | ✅ tesseract.js |
⭐ Easiest | 4-6MB | 431K DL/week |
| tesseract-wasm | ✅ tesseract-wasm |
⭐ Easy | 2.1MB | Alternative |
| Transformers.js | ✅ @huggingface/transformers |
⭐ Easy | Variable | Second engine |
| Scribe.js | ✅ scribe.js-ocr |
⭐ Easy | Larger | If need PDF |
| client-side-ocr | client-side-ocr |
Medium | Sparse docs | |
| ocrs | ❌ Manual build | ❌ Complex | Small | Rust - no npm |
| docTR | ❌ No package | ❌ Demo only | Large | Not usable |
Why Tesseract.js + Transformers.js:
- Both are true npm libraries with
npm installsupport - Well-documented, production-ready APIs
- Fit perfectly into Strategy Pattern (both can be wrapped in
IOCREngine) - Large communities and active maintenance
- Clear upgrade path from traditional (Tesseract) to modern (Transformers)
Decision Needed: What browsers MUST work for MVP?
| Browser | Version | Required Features | MVP Support? |
|---|---|---|---|
| Chrome/Edge | 90+ | WASM, Web Workers, IndexedDB | [x] Yes [ ] No |
| Firefox | 88+ | WASM, Web Workers, IndexedDB | [x] Yes [ ] No |
| Safari | 15+ | WASM, Web Workers, IndexedDB | [x] Yes [ ] No |
| Mobile Chrome | 90+ | Same as desktop | [ ] Yes [x] No (defer) |
| Mobile Safari | 15+ | Same as desktop | [ ] Yes [x] No (defer) |
WebGPU Requirement for MVP:
- Required (blocks older devices)
- Optional (CPU fallback acceptable)
- Not needed for MVP (Tesseract.js uses WASM only)
MVP Decision: ✅ Desktop browsers only: Chrome 90+, Firefox 88+, Safari 15+
Rationale:
- These versions have stable WASM, Web Workers, and IndexedDB support (released ~2021)
- Covers 95%+ of desktop users on evergreen browsers
- Mobile deferred to post-MVP because:
- Mobile OCR has different UX constraints (camera integration, smaller screens)
- Memory management is more critical (lower RAM limits)
- Performance testing needed on mid-range devices
- Requires separate testing matrix
Feature Detection:
// Add to app initialization
const isSupported =
typeof WebAssembly !== 'undefined' &&
typeof Worker !== 'undefined' &&
typeof indexedDB !== 'undefined';
if (!isSupported) {
showError('Browser not supported. Please use Chrome 90+, Firefox 88+, or Safari 15+');
}Post-MVP: Test and optimize for mobile browsers (memory management, responsive UI, camera capture)
Decision Needed: What gets cached for MVP?
- Model files only (essential for performance) ← Recommended MVP
- Model files + OCR results (adds complexity)
- Nothing (re-download every time - not viable)
Cache Key Strategy:
- Simple:
${engineId}-${modelName} - Versioned:
${engineId}-${modelName}-v${version}
MVP Decision: ✅ Cache model files only, with versioned keys
Rationale:
- Critical for UX: English traineddata is ~4MB. Without caching, users wait 2-10 seconds on every page load.
- Storage is cheap: 4MB is negligible (typical quota: 50MB-1GB per origin)
- Versioned keys prevent stale data: When Tesseract.js updates models, new version auto-downloads
- Privacy-friendly: No user data persisted, only public model files
- Simple to implement: Tesseract.js has built-in caching support via
cacheMethod: 'refresh'
Implementation:
// Tesseract.js automatically uses IndexedDB when available
const worker = await createWorker('eng', 1, {
cacheMethod: 'refresh', // Use cached if available, refresh if stale
cachePath: '.', // Default IndexedDB location
});Cache Key Format: tesseract-cache-eng-v5.0.0 (handled automatically by library)
Manual Cache Clear (for testing):
// Add a "Clear Cache" button that calls:
indexedDB.deleteDatabase('tesseract-cache');Post-MVP: Consider caching OCR results for duplicate image detection (requires privacy analysis)
Decision Needed: How do we inform users during the load() phase?
MVP Requirement: At minimum, show a binary state:
- Simple boolean callback:
onLoadingChange(isLoading: boolean) - Progress percentage:
onProgress(percent: number, message: string)(post-MVP)
MVP Decision: ✅ Simple boolean loading state
Rationale:
- Sufficient for MVP: Binary state covers the essential feedback: "System is working vs. ready"
- Simple to implement: No complex progress tracking logic needed
- Tesseract.js emits detailed progress events, but interpreting them requires domain knowledge (what does "recognizing text at 45%" mean to users?)
- Good UI patterns exist: Spinner + "Loading OCR engine..." message is clear and familiar
MVP Implementation:
interface IOCREngine {
id: string;
load(): Promise<void>;
process(data: ImageData): Promise<string>;
destroy(): Promise<void>;
isLoading: boolean; // Add this property
}
// In UI:
if (engine.isLoading) {
showSpinner('Loading OCR engine...');
} else {
hideSpinner();
}Post-MVP Enhancement: Add granular progress for better UX during 4MB model download:
interface IOCREngine {
onProgress?: (progress: { percent: number; status: string }) => void;
}
// Shows: "Downloading model... 67%" → "Initializing engine..." → "Ready"Decision Needed: Is runtime engine switching required for MVP?
Scenario: User loads page, processes images with Engine A, then wants to try Engine B.
| Approach | MVP Complexity | User Impact |
|---|---|---|
| Page reload required | Low (no switching logic) | Acceptable for MVP |
| Hot-swap (destroy A, load B) | Medium (implement cleanup) | Better UX, more complexity |
| Pre-load both engines | High (memory management) | Not MVP scope |
MVP Decision: ✅ Not applicable - single engine only. Post-MVP: Page reload acceptable initially.
Rationale:
- MVP is Tesseract.js only: No engine switching exists in MVP scope
- When adding 2nd engine (post-MVP): Page reload is acceptable because:
- Engine selection is typically a one-time choice per session
- Users don't frequently alternate between engines on the same document
- Avoids complex memory management and cleanup logic
- Can be upgraded to hot-swapping later once multi-engine architecture is validated
MVP Implementation:
// Single engine, no switching needed
const manager = new OCRManager();
await manager.setEngine('tesseract'); // Called once at app initPost-MVP (Multi-Engine) Implementation:
<!-- Simple dropdown that reloads page -->
<select onchange="location.href = '?engine=' + this.value">
<option value="tesseract">Fast (Tesseract)</option>
<option value="transformers">High Accuracy (Transformers.js)</option>
</select>
<script>
const params = new URLSearchParams(location.search);
const engine = params.get('engine') || 'tesseract';
await manager.setEngine(engine);
</script>Future Enhancement: Once multi-engine works via reload, implement hot-swapping:
// Proper cleanup ensures memory doesn't leak
await manager.setEngine('transformers'); // Automatically destroys Tesseract firstThese are important for reliability and scalability but can be deferred for initial proof-of-concept.
Problem: Large images or models can exhaust WASM heap (2GB-4GB limit).
Production Requirement: Define behavior when load() or process() fails with OOM.
Options:
- Fail gracefully with error message: "Image too large. Try a smaller image."
- Automatic image downscaling retry: Resize to max dimension (e.g., 2000px) and retry
- Chunk processing: Split large images into tiles (complex)
Decision: ________________ Timeline: ________________
Problem: User uploads non-image file or corrupt data.
Production Requirement: Validate input before expensive processing.
Strategy:
- Pre-flight check: Validate image can be decoded to canvas
- Engine-level try/catch with user-friendly error
- Graceful degradation: Skip invalid images in batch processing
Decision: ________________ Timeline: ________________
Problem: WebGPU support is inconsistent (Safari limited, Firefox behind flag).
Production Requirement: Define fallback chain.
Proposed Chain:
WebGPU (if available)
→ WebGL backend (via ONNX Runtime Web)
→ CPU WASM (slow but universal)
Decision:
- Implement full chain
- WebGPU → CPU WASM only (skip WebGL)
- Detect capability and block unsupported browsers with message
Timeline: ________________
Problem: What happens if user uploads 10 images at once?
Production Options:
| Model | Implementation | Use Case |
|---|---|---|
| Single-threaded queue | Queue images, process serially | Simple, predictable memory usage |
| Worker pool (N=2-4) | Process N images in parallel | Faster, higher memory usage |
| Adaptive | Start serial, scale to pool if system allows | Complex but optimal |
Decision: ________________ Timeline: ________________
Problem: User clicks "Switch to Transformers.js" while Tesseract is processing image #3 of 10.
Production Behavior:
- Block switch until current job finishes
- Abort current job, switch immediately (requires cancellation mechanism)
- Queue switch for after current batch
Decision: ________________ Timeline: ________________
Problem: Browser throttles Web Workers when tab is not visible.
Production Strategy:
- Accept slowdown (no action needed)
- Warn user: "Processing may slow down in background tabs"
- Use Shared Workers to maintain performance (complex, rare use case)
Decision: ________________ Timeline: ________________
Problem: Model files update. How do we force re-download?
Production Approaches:
- Version-based: Cache key includes version
tesseract-eng-v4.0.0 - Time-based: Evict cache after N days
- Manual: User clears cache via settings panel
- Hybrid: Version-based + manual clear option
Decision: ________________ Timeline: ________________
Problem: User has limited disk space. IndexedDB write fails.
Production Behavior:
- Fail with error: "Cannot cache model. Clear browser storage."
- Fallback to in-memory only (re-download each session)
- LRU eviction: Delete oldest cached model to make space
Decision: ________________ Timeline: ________________
Problem: Should we cache OCR results (text output) or only models?
Trade-offs:
| Caching Results | Pros | Cons |
|---|---|---|
| Yes | Instant re-process of same image | Privacy concern, storage bloat |
| No | Simpler, user privacy-friendly | Re-process costs time |
Decision: ________________ Timeline: ________________
Decision Needed: How to handle non-English text?
Complexity: Each language may require separate model files (50MB+ per language).
Approach:
- Auto-detect language (requires detection model)
- User selects language from dropdown
- English-only for MVP, expand later
Timeline: ________________
Decision Needed: Should the system auto-enhance images?
Common Enhancements:
- Grayscale conversion
- Contrast adjustment
- Deskew (rotation correction)
- Noise removal
Options:
- Built into each engine adapter
- Shared preprocessing utility
- User-controlled (advanced mode)
- None (expect clean input)
Decision: ________________ Timeline: ________________
Decision Needed: Support PDF or multi-image uploads?
Complexity: Adds file parsing, page management UI, progress tracking per page.
Decision:
- MVP: Single images only
- Post-MVP: Add batch support
Timeline: ________________
Decision Needed: What metrics matter for production debugging?
Suggested Metrics:
- Engine load time (by browser/device)
- Processing time per image (by size/engine)
- Error rates by category
- Cache hit rate
Implementation:
- Console logging only (MVP)
- Structured logging to service
- Browser-local analytics dashboard
Decision: ________________ Timeline: ________________
When resolving an item, fill out:
### [Item Number] [Title]
**Date Decided**: YYYY-MM-DD
**Decided By**: [Name/Team]
**Decision**: [Clear statement]
**Rationale**: [Why this choice over alternatives]
**Implementation Notes**: [Gotchas, references, etc.]
**Status**: ✅ Implemented | 🚧 In Progress | 📋 Planned| Feature | MVP ✅ | Production 🔧 |
|---|---|---|
| Input format | ImageData only |
+ Blob support |
| Output format | Plain string |
Structured with confidence |
| Engines | Tesseract.js only | + Transformers.js (TrOCR) |
| Browser support | Desktop: Chrome 90+, Firefox 88+, Safari 15+ | + Mobile browsers |
| Error handling | Basic try/catch, show error message | Graceful degradation, retries |
| Concurrency | Single image at a time | Queue or worker pool |
| Caching | Model files with versioned keys (IndexedDB) | + quota management, result caching |
| WebGPU | Not needed (WASM only) | Required for Transformers.js, fallback chain |
| Loading UX | Boolean isLoading state |
Progress percentage with status messages |
| Engine switching | Not applicable (single engine) | Page reload → hot-swap later |
| Language support | English only | Multi-language detection |
| Preprocessing | None (expect clean input) | Auto-enhance pipeline |
| Multi-page | Single images only | PDF/batch support |
| Telemetry | Console logs | Structured metrics |
All MVP blockers have been resolved. You can now proceed with implementation using these decisions:
- Engine: Tesseract.js v5.x (WASM, no WebGPU needed)
- Input:
ImageDataonly - Output:
Promise<string>(plain text) - Browser Support: Desktop only (Chrome 90+, Firefox 88+, Safari 15+)
- Loading State: Boolean
isLoadingproperty - Caching: Model files only, automatic via Tesseract.js IndexedDB caching
interface IOCREngine {
id: string;
load(): Promise<void>;
process(data: ImageData): Promise<string>;
destroy(): Promise<void>;
isLoading: boolean;
}- Install
tesseract.jspackage - Create
TesseractEngineclass implementingIOCREngine - Create
OCRManagerfactory class - Implement feature detection (WASM, Workers, IndexedDB)
- Build simple UI: file upload → canvas → ImageData → OCR
- Add loading spinner with "Loading OCR engine..." message
- Test on Chrome, Firefox, Safari (desktop)
- Validate model caching works (check IndexedDB in DevTools)
MVP is complete when:
- User can upload an image in supported browsers
- Text is extracted and displayed (2-5 seconds for typical document)
- Second page load uses cached model (sub-second init)
- Memory is released properly (worker terminates on page close)
- ✅ Review Session: All MVP blockers resolved (2026-01-15)
- ✅ Update Specification:
SPECIFICATION.mdupdated with concrete implementations - Begin Implementation: Follow MVP checklist in
SPECIFICATION.mdSection 8 - Prototype: Build Tesseract.js engine to validate architecture
- Validate: Test on target browsers, verify caching and cleanup
- Expand: Add Transformers.js as second engine to validate Strategy Pattern
Decision: Ship phased write-back improvements with selectable quality presets (fast, balanced, high-quality).
Rationale:
- Earlier write-back used rectangular fill + center text rendering, which produced visible artifacts.
- Geometry-aware line rendering and baseline placement improve natural layout retention.
- Optional OpenCV inpainting provides better background restoration when runtime is available, with stable fallback behavior when not.
- WCAG-based contrast selection, RTL direction handling, and optional halo stroke improve multilingual readability.
Status: ✅ Implemented across src/utils/image-writeback.ts, src/translation-controller.ts, and related tests.
Decision: Implement persistent caching for eSearch-OCR models using ModelCache (IndexedDB).
Rationale:
- Unlike Tesseract.js,
esearch-ocrimplementation does not automatically handle persistent caching of model files. - Switching languages or re-initializing the engine triggered redundant network requests.
- Integrated the existing
ModelCacheutility to store.onnxand dictionary files in IndexedDB, preventing re-downloads.
Status: ✅ Implemented in src/engines/esearch-engine.ts.
Decision: Host model files locally in public/models/esearch/ with manual download
Rationale:
- Local hosting preferred for MVP: Simplest setup, no CORS issues, fastest load times
- Manual download acceptable: ~7 MB one-time download is reasonable for developers
- Git-ignored: Model files (*.onnx, *.txt) excluded from version control to keep repo small
- Documentation-first: Clear README with download instructions ensures reproducibility
Model Files (from eSearch-OCR v4.0.0):
| File | Size | Purpose |
|---|---|---|
det.onnx |
~2.3 MB | Text detection (DBNet) |
rec.onnx |
~4.5 MB | Text recognition (CRNN) |
ppocr_keys_v1.txt |
~30 KB | Character dictionary (6,623 chars) |
Alternative Hosting Options (documented for production):
- jsDelivr CDN:
https://cdn.jsdelivr.net/gh/xushengfeng/eSearch-OCR@4.0.0/- Pro: Free, reliable, cached globally
- Con: CORS may need verification, dependent on upstream repo
- Self-hosted S3/CloudFront: Upload to own infrastructure
- Pro: Full control, no external dependencies
- Con: Cost, maintenance overhead
- GitHub Releases direct: Direct links to release assets
- Pro: Simple, free, versioned
- Con: May have rate limits for high traffic
Implementation:
- Directory structure:
public/models/esearch/ - Model paths in code:
/models/esearch/{det.onnx,rec.onnx,ppocr_keys_v1.txt} - Documentation:
public/models/esearch/README.md
Decisions Made:
- ✅ Input format:
ImageDataonly - ✅ Output format: Plain
Promise<string> - ✅ Engine: Tesseract.js v5.x (npm:
tesseract.js) - ✅ Browser support: Desktop only (Chrome 90+, Firefox 88+, Safari 15+)
- ✅ Caching: Model files with versioned keys (automatic via Tesseract.js)
- ✅ Loading state: Boolean
isLoadingproperty - ✅ Engine switching: N/A for MVP (single engine only)
Research Completed:
- Evaluated 7 modern OCR alternatives to Tesseract.js
- Confirmed Tesseract.js best for MVP (proven, easy integration)
- Selected Transformers.js for second engine (modern, WebGPU, future-proof)
Documentation Updated:
SPECIFICATION.md: Added concrete implementations, feature detection, checklistsDECISION_LOG.md: Added integration analysis, MVP vs Production comparison
Status: ✅ Ready to implement. See SPECIFICATION.md Section 8 for checklist.
Document Owner: ________________ Review Cadence: Weekly during MVP development, then monthly for production items