Decision Log: Browser OCR Service

Status: Draft - Awaiting Decisions Last Updated: 2026-01-15

🚨 MVP BLOCKERS (Must Resolve Before First Implementation)

These decisions are essential to write any functional code. Without answers, engineers cannot proceed.

1. Core Data Contracts

1.1 Input Format Specification

Decision Needed: What image formats will the process() method accept?

Option	Pros	Cons	Recommendation
`ImageData` only	Simple, canvas-native	Requires caller to convert first	⭐ Start here for MVP
`Blob + ImageData`	Flexible, handles file uploads	More adapter logic needed	Production upgrade
`ArrayBuffer + ImageData + Blob`	Maximum flexibility	Complex type guards needed	Overkill for MVP

MVP Decision: ✅ ImageData only

Rationale:

Canvas-native type, no conversion needed within OCR engine
Forces clean separation: UI layer handles file upload → canvas → ImageData, OCR layer focuses purely on processing
Tesseract.js and Transformers.js both accept ImageData natively
Can easily extend interface later: process(data: ImageData | Blob) without breaking existing code

1.2 Output Format Specification

Decision Needed: What does process() return?

Option	Structure	Use Case	Recommendation
Plain text string	`Promise<string>`	Simple text extraction	⭐ MVP sufficient
Structured result	`Promise<OCRResult>` with `{ text, confidence, language }`	Quality metrics needed	Consider for MVP+
Full bounding boxes	`Promise<OCRResult>` with `{ text, words: Array<{text, bbox, confidence}> }`	Complex layout analysis	Production feature

MVP Decision: ✅ Plain text string (Promise<string>)

Rationale:

Satisfies 90% of OCR use cases: "What text is in this image?"
Zero serialization overhead for Web Worker communication
Easier to test and debug
Can upgrade to structured format later without breaking the interface contract (engines can internally return structured data, just extract .text property for now)

Production Upgrade Path:

// When needed, change to:
interface OCRResult {
  text: string;
  confidence?: number;      // 0-100 quality score
  language?: string;         // ISO 639-1 code (e.g., 'en')
}
// Return type becomes Promise<OCRResult>

1.3 Engine Selection

Decision Needed: Which specific OCR libraries are we implementing?

Primary Engine (Fast option):

Tesseract.js (WASM port of Tesseract)
Custom lightweight solution
Other: ________________

Secondary Engine (High accuracy option):

Transformers.js with TrOCR model (post-MVP)
Custom ONNX model (specify which): ________________
Other: ________________

MVP Decision: ✅ Start with Tesseract.js only

Rationale:

Proven stability: 10M+ weekly npm downloads, battle-tested in production
No WebGPU dependency: Pure WASM, works on all modern browsers including older Safari
Comprehensive documentation: Extensive examples and community support
Language support: 100+ languages pre-trained (English for MVP, easy to expand)
Reasonable performance: 2-5 seconds for typical documents on modern hardware
Small initial bundle: Core library ~2MB, English traineddata ~4MB (acceptable for web)

Implementation Details:

Package: tesseract.js v5.x
Worker setup: Use built-in worker support (createWorker())
Model loading: English (eng.traineddata) from CDN with IndexedDB caching
Initialization: Call worker.loadLanguage('eng') then worker.initialize('eng')

Post-MVP: Add Transformers.js (TrOCR) as second engine to validate Strategy Pattern. This will require WebGPU fallback logic.

Integration Analysis (researched 2026-01-15):

After evaluating modern alternatives, Tesseract.js remains the best MVP choice:

Library	npm Package	Integration	Bundle Size	Status
Tesseract.js	✅ `tesseract.js`	⭐ Easiest	4-6MB	431K DL/week
tesseract-wasm	✅ `tesseract-wasm`	⭐ Easy	2.1MB	Alternative
Transformers.js	✅ `@huggingface/transformers`	⭐ Easy	Variable	Second engine
Scribe.js	✅ `scribe.js-ocr`	⭐ Easy	Larger	If need PDF
client-side-ocr	⚠️ `client-side-ocr`	⚠️ Unclear	Medium	Sparse docs
ocrs	❌ Manual build	❌ Complex	Small	Rust - no npm
docTR	❌ No package	❌ Demo only	Large	Not usable

Why Tesseract.js + Transformers.js:

Both are true npm libraries with npm install support
Well-documented, production-ready APIs
Fit perfectly into Strategy Pattern (both can be wrapped in IOCREngine)
Large communities and active maintenance
Clear upgrade path from traditional (Tesseract) to modern (Transformers)

1.4 Minimum Browser Support

Decision Needed: What browsers MUST work for MVP?

Browser	Version	Required Features	MVP Support?
Chrome/Edge	90+	WASM, Web Workers, IndexedDB	[x] Yes [ ] No
Firefox	88+	WASM, Web Workers, IndexedDB	[x] Yes [ ] No
Safari	15+	WASM, Web Workers, IndexedDB	[x] Yes [ ] No
Mobile Chrome	90+	Same as desktop	[ ] Yes [x] No (defer)
Mobile Safari	15+	Same as desktop	[ ] Yes [x] No (defer)

WebGPU Requirement for MVP:

Required (blocks older devices)
Optional (CPU fallback acceptable)
Not needed for MVP (Tesseract.js uses WASM only)

MVP Decision: ✅ Desktop browsers only: Chrome 90+, Firefox 88+, Safari 15+

Rationale:

These versions have stable WASM, Web Workers, and IndexedDB support (released ~2021)
Covers 95%+ of desktop users on evergreen browsers
Mobile deferred to post-MVP because:
- Mobile OCR has different UX constraints (camera integration, smaller screens)
- Memory management is more critical (lower RAM limits)
- Performance testing needed on mid-range devices
- Requires separate testing matrix

Feature Detection:

// Add to app initialization
const isSupported =
  typeof WebAssembly !== 'undefined' &&
  typeof Worker !== 'undefined' &&
  typeof indexedDB !== 'undefined';

if (!isSupported) {
  showError('Browser not supported. Please use Chrome 90+, Firefox 88+, or Safari 15+');
}

Post-MVP: Test and optimize for mobile browsers (memory management, responsive UI, camera capture)

1.5 Storage Strategy (Minimum Viable)

Decision Needed: What gets cached for MVP?

Model files only (essential for performance) ← Recommended MVP
Model files + OCR results (adds complexity)
Nothing (re-download every time - not viable)

Cache Key Strategy:

Simple: ${engineId}-${modelName}
Versioned: ${engineId}-${modelName}-v${version}

MVP Decision: ✅ Cache model files only, with versioned keys

Rationale:

Critical for UX: English traineddata is ~4MB. Without caching, users wait 2-10 seconds on every page load.
Storage is cheap: 4MB is negligible (typical quota: 50MB-1GB per origin)
Versioned keys prevent stale data: When Tesseract.js updates models, new version auto-downloads
Privacy-friendly: No user data persisted, only public model files
Simple to implement: Tesseract.js has built-in caching support via cacheMethod: 'refresh'

Implementation:

// Tesseract.js automatically uses IndexedDB when available
const worker = await createWorker('eng', 1, {
  cacheMethod: 'refresh',  // Use cached if available, refresh if stale
  cachePath: '.',          // Default IndexedDB location
});

Cache Key Format: tesseract-cache-eng-v5.0.0 (handled automatically by library)

Manual Cache Clear (for testing):

// Add a "Clear Cache" button that calls:
indexedDB.deleteDatabase('tesseract-cache');

Post-MVP: Consider caching OCR results for duplicate image detection (requires privacy analysis)

2. User Experience (MVP Minimums)

2.1 Loading State Communication

Decision Needed: How do we inform users during the load() phase?

MVP Requirement: At minimum, show a binary state:

Simple boolean callback: onLoadingChange(isLoading: boolean)
Progress percentage: onProgress(percent: number, message: string) (post-MVP)

MVP Decision: ✅ Simple boolean loading state

Rationale:

Sufficient for MVP: Binary state covers the essential feedback: "System is working vs. ready"
Simple to implement: No complex progress tracking logic needed
Tesseract.js emits detailed progress events, but interpreting them requires domain knowledge (what does "recognizing text at 45%" mean to users?)
Good UI patterns exist: Spinner + "Loading OCR engine..." message is clear and familiar

MVP Implementation:

interface IOCREngine {
  id: string;
  load(): Promise<void>;
  process(data: ImageData): Promise<string>;
  destroy(): Promise<void>;
  isLoading: boolean;  // Add this property
}

// In UI:
if (engine.isLoading) {
  showSpinner('Loading OCR engine...');
} else {
  hideSpinner();
}

Post-MVP Enhancement: Add granular progress for better UX during 4MB model download:

interface IOCREngine {
  onProgress?: (progress: { percent: number; status: string }) => void;
}

// Shows: "Downloading model... 67%" → "Initializing engine..." → "Ready"

2.2 Engine Switching Behavior (MVP Scope)

Decision Needed: Is runtime engine switching required for MVP?

Scenario: User loads page, processes images with Engine A, then wants to try Engine B.

Approach	MVP Complexity	User Impact
Page reload required	Low (no switching logic)	Acceptable for MVP
Hot-swap (destroy A, load B)	Medium (implement cleanup)	Better UX, more complexity
Pre-load both engines	High (memory management)	Not MVP scope

MVP Decision: ✅ Not applicable - single engine only. Post-MVP: Page reload acceptable initially.

Rationale:

MVP is Tesseract.js only: No engine switching exists in MVP scope
When adding 2nd engine (post-MVP): Page reload is acceptable because:
- Engine selection is typically a one-time choice per session
- Users don't frequently alternate between engines on the same document
- Avoids complex memory management and cleanup logic
- Can be upgraded to hot-swapping later once multi-engine architecture is validated

MVP Implementation:

// Single engine, no switching needed
const manager = new OCRManager();
await manager.setEngine('tesseract');  // Called once at app init

Post-MVP (Multi-Engine) Implementation:

<!-- Simple dropdown that reloads page -->
<select onchange="location.href = '?engine=' + this.value">
  <option value="tesseract">Fast (Tesseract)</option>
  <option value="transformers">High Accuracy (Transformers.js)</option>
</select>

<script>
  const params = new URLSearchParams(location.search);
  const engine = params.get('engine') || 'tesseract';
  await manager.setEngine(engine);
</script>

Future Enhancement: Once multi-engine works via reload, implement hot-swapping:

// Proper cleanup ensures memory doesn't leak
await manager.setEngine('transformers'); // Automatically destroys Tesseract first

🔧 PRODUCTION-GRADE REQUIREMENTS (Post-MVP)

These are important for reliability and scalability but can be deferred for initial proof-of-concept.

3. Error Handling & Resilience

3.1 WASM Memory Exhaustion

Problem: Large images or models can exhaust WASM heap (2GB-4GB limit).

Production Requirement: Define behavior when load() or process() fails with OOM.

Options:

Fail gracefully with error message: "Image too large. Try a smaller image."
Automatic image downscaling retry: Resize to max dimension (e.g., 2000px) and retry
Chunk processing: Split large images into tiles (complex)

Decision: ________________ Timeline: ________________

3.2 Invalid/Corrupt Image Handling

Problem: User uploads non-image file or corrupt data.

Production Requirement: Validate input before expensive processing.

Strategy:

Pre-flight check: Validate image can be decoded to canvas
Engine-level try/catch with user-friendly error
Graceful degradation: Skip invalid images in batch processing

Decision: ________________ Timeline: ________________

3.3 WebGPU Unavailability Fallback

Problem: WebGPU support is inconsistent (Safari limited, Firefox behind flag).

Production Requirement: Define fallback chain.

Proposed Chain:

WebGPU (if available)
  → WebGL backend (via ONNX Runtime Web)
    → CPU WASM (slow but universal)

Decision:

Implement full chain
WebGPU → CPU WASM only (skip WebGL)
Detect capability and block unsupported browsers with message

Timeline: ________________

4. Concurrency & Performance

4.1 Concurrency Model

Problem: What happens if user uploads 10 images at once?

Production Options:

Model	Implementation	Use Case
Single-threaded queue	Queue images, process serially	Simple, predictable memory usage
Worker pool (N=2-4)	Process N images in parallel	Faster, higher memory usage
Adaptive	Start serial, scale to pool if system allows	Complex but optimal

Decision: ________________ Timeline: ________________

4.2 Mid-Processing Engine Switch

Problem: User clicks "Switch to Transformers.js" while Tesseract is processing image #3 of 10.

Production Behavior:

Block switch until current job finishes
Abort current job, switch immediately (requires cancellation mechanism)
Queue switch for after current batch

Decision: ________________ Timeline: ________________

4.3 Tab Backgrounding

Problem: Browser throttles Web Workers when tab is not visible.

Production Strategy:

Accept slowdown (no action needed)
Warn user: "Processing may slow down in background tabs"
Use Shared Workers to maintain performance (complex, rare use case)

Decision: ________________ Timeline: ________________

5. Storage & Caching (Production-Grade)

5.1 Cache Invalidation Strategy

Problem: Model files update. How do we force re-download?

Production Approaches:

Version-based: Cache key includes version tesseract-eng-v4.0.0
Time-based: Evict cache after N days
Manual: User clears cache via settings panel
Hybrid: Version-based + manual clear option

Decision: ________________ Timeline: ________________

5.2 Storage Quota Exceeded

Problem: User has limited disk space. IndexedDB write fails.

Production Behavior:

Fail with error: "Cannot cache model. Clear browser storage."
Fallback to in-memory only (re-download each session)
LRU eviction: Delete oldest cached model to make space

Decision: ________________ Timeline: ________________

5.3 Cache Result Storage

Problem: Should we cache OCR results (text output) or only models?

Trade-offs:

Caching Results	Pros	Cons
Yes	Instant re-process of same image	Privacy concern, storage bloat
No	Simpler, user privacy-friendly	Re-process costs time

Decision: ________________ Timeline: ________________

6. Advanced Features (Post-MVP Roadmap)

6.1 Multi-Language Support

Decision Needed: How to handle non-English text?

Complexity: Each language may require separate model files (50MB+ per language).

Approach:

Auto-detect language (requires detection model)
User selects language from dropdown
English-only for MVP, expand later

Timeline: ________________

6.2 Preprocessing Pipeline

Decision Needed: Should the system auto-enhance images?

Common Enhancements:

Grayscale conversion
Contrast adjustment
Deskew (rotation correction)
Noise removal

Options:

Built into each engine adapter
Shared preprocessing utility
User-controlled (advanced mode)
None (expect clean input)

Decision: ________________ Timeline: ________________

6.3 Batch/Multi-Page Documents

Decision Needed: Support PDF or multi-image uploads?

Complexity: Adds file parsing, page management UI, progress tracking per page.

Decision:

MVP: Single images only
Post-MVP: Add batch support

Timeline: ________________

6.4 Telemetry & Observability

Decision Needed: What metrics matter for production debugging?

Suggested Metrics:

Engine load time (by browser/device)
Processing time per image (by size/engine)
Error rates by category
Cache hit rate

Implementation:

Console logging only (MVP)
Structured logging to service
Browser-local analytics dashboard

Decision: ________________ Timeline: ________________

📝 Decision Template

When resolving an item, fill out:

### [Item Number] [Title]
**Date Decided**: YYYY-MM-DD
**Decided By**: [Name/Team]
**Decision**: [Clear statement]
**Rationale**: [Why this choice over alternatives]
**Implementation Notes**: [Gotchas, references, etc.]
**Status**: ✅ Implemented | 🚧 In Progress | 📋 Planned

Quick Reference: MVP vs Production

Feature	MVP ✅	Production 🔧
Input format	`ImageData` only	+ `Blob` support
Output format	Plain `string`	Structured with confidence
Engines	Tesseract.js only	+ Transformers.js (TrOCR)
Browser support	Desktop: Chrome 90+, Firefox 88+, Safari 15+	+ Mobile browsers
Error handling	Basic try/catch, show error message	Graceful degradation, retries
Concurrency	Single image at a time	Queue or worker pool
Caching	Model files with versioned keys (IndexedDB)	+ quota management, result caching
WebGPU	Not needed (WASM only)	Required for Transformers.js, fallback chain
Loading UX	Boolean `isLoading` state	Progress percentage with status messages
Engine switching	Not applicable (single engine)	Page reload → hot-swap later
Language support	English only	Multi-language detection
Preprocessing	None (expect clean input)	Auto-enhance pipeline
Multi-page	Single images only	PDF/batch support
Telemetry	Console logs	Structured metrics

📋 MVP Decisions Summary (Ready to Implement)

All MVP blockers have been resolved. You can now proceed with implementation using these decisions:

Core Architecture

Engine: Tesseract.js v5.x (WASM, no WebGPU needed)
Input: ImageData only
Output: Promise<string> (plain text)
Browser Support: Desktop only (Chrome 90+, Firefox 88+, Safari 15+)
Loading State: Boolean isLoading property
Caching: Model files only, automatic via Tesseract.js IndexedDB caching

Interface Contract

interface IOCREngine {
  id: string;
  load(): Promise<void>;
  process(data: ImageData): Promise<string>;
  destroy(): Promise<void>;
  isLoading: boolean;
}

Implementation Checklist

Install tesseract.js package
Create TesseractEngine class implementing IOCREngine
Create OCRManager factory class
Implement feature detection (WASM, Workers, IndexedDB)
Build simple UI: file upload → canvas → ImageData → OCR
Add loading spinner with "Loading OCR engine..." message
Test on Chrome, Firefox, Safari (desktop)
Validate model caching works (check IndexedDB in DevTools)

Validation Criteria

MVP is complete when:

User can upload an image in supported browsers
Text is extracted and displayed (2-5 seconds for typical document)
Second page load uses cached model (sub-second init)
Memory is released properly (worker terminates on page close)

Next Steps

✅ Review Session: All MVP blockers resolved (2026-01-15)
✅ Update Specification: SPECIFICATION.md updated with concrete implementations
Begin Implementation: Follow MVP checklist in SPECIFICATION.md Section 8
Prototype: Build Tesseract.js engine to validate architecture
Validate: Test on target browsers, verify caching and cleanup
Expand: Add Transformers.js as second engine to validate Strategy Pattern

Changes Log

2026-02-06: Translated Image Write-Back Quality Pipeline

Decision: Ship phased write-back improvements with selectable quality presets (fast, balanced, high-quality).

Rationale:

Earlier write-back used rectangular fill + center text rendering, which produced visible artifacts.
Geometry-aware line rendering and baseline placement improve natural layout retention.
Optional OpenCV inpainting provides better background restoration when runtime is available, with stable fallback behavior when not.
WCAG-based contrast selection, RTL direction handling, and optional halo stroke improve multilingual readability.

Status: ✅ Implemented across src/utils/image-writeback.ts, src/translation-controller.ts, and related tests.

2026-01-17: eSearch-OCR Model Caching

Decision: Implement persistent caching for eSearch-OCR models using ModelCache (IndexedDB).

Rationale:

Unlike Tesseract.js, esearch-ocr implementation does not automatically handle persistent caching of model files.
Switching languages or re-initializing the engine triggered redundant network requests.
Integrated the existing ModelCache utility to store .onnx and dictionary files in IndexedDB, preventing re-downloads.

Status: ✅ Implemented in src/engines/esearch-engine.ts.

2026-01-16: eSearch-OCR Model Hosting Strategy

Decision: Host model files locally in public/models/esearch/ with manual download

Rationale:

Local hosting preferred for MVP: Simplest setup, no CORS issues, fastest load times
Manual download acceptable: ~7 MB one-time download is reasonable for developers
Git-ignored: Model files (*.onnx, *.txt) excluded from version control to keep repo small
Documentation-first: Clear README with download instructions ensures reproducibility

Model Files (from eSearch-OCR v4.0.0):

File	Size	Purpose
`det.onnx`	~2.3 MB	Text detection (DBNet)
`rec.onnx`	~4.5 MB	Text recognition (CRNN)
`ppocr_keys_v1.txt`	~30 KB	Character dictionary (6,623 chars)

Alternative Hosting Options (documented for production):

jsDelivr CDN: https://cdn.jsdelivr.net/gh/xushengfeng/eSearch-OCR@4.0.0/
- Pro: Free, reliable, cached globally
- Con: CORS may need verification, dependent on upstream repo
Self-hosted S3/CloudFront: Upload to own infrastructure
- Pro: Full control, no external dependencies
- Con: Cost, maintenance overhead
GitHub Releases direct: Direct links to release assets
- Pro: Simple, free, versioned
- Con: May have rate limits for high traffic

Implementation:

Directory structure: public/models/esearch/
Model paths in code: /models/esearch/{det.onnx,rec.onnx,ppocr_keys_v1.txt}
Documentation: public/models/esearch/README.md

2026-01-15: All MVP Blockers Resolved

Decisions Made:

✅ Input format: ImageData only
✅ Output format: Plain Promise<string>
✅ Engine: Tesseract.js v5.x (npm: tesseract.js)
✅ Browser support: Desktop only (Chrome 90+, Firefox 88+, Safari 15+)
✅ Caching: Model files with versioned keys (automatic via Tesseract.js)
✅ Loading state: Boolean isLoading property
✅ Engine switching: N/A for MVP (single engine only)

Research Completed:

Evaluated 7 modern OCR alternatives to Tesseract.js
Confirmed Tesseract.js best for MVP (proven, easy integration)
Selected Transformers.js for second engine (modern, WebGPU, future-proof)

Documentation Updated:

SPECIFICATION.md: Added concrete implementations, feature detection, checklists
DECISION_LOG.md: Added integration analysis, MVP vs Production comparison

Status: ✅ Ready to implement. See SPECIFICATION.md Section 8 for checklist.

Document Owner: ________________ Review Cadence: Weekly during MVP development, then monthly for production items

FilesExpand file tree

DECISION_LOG.md

Latest commit

History

DECISION_LOG.md

File metadata and controls

Decision Log: Browser OCR Service

🚨 MVP BLOCKERS (Must Resolve Before First Implementation)

1. Core Data Contracts

1.1 Input Format Specification

1.2 Output Format Specification

1.3 Engine Selection

1.4 Minimum Browser Support

1.5 Storage Strategy (Minimum Viable)

2. User Experience (MVP Minimums)

2.1 Loading State Communication

2.2 Engine Switching Behavior (MVP Scope)

🔧 PRODUCTION-GRADE REQUIREMENTS (Post-MVP)

3. Error Handling & Resilience

3.1 WASM Memory Exhaustion

3.2 Invalid/Corrupt Image Handling

3.3 WebGPU Unavailability Fallback

4. Concurrency & Performance

4.1 Concurrency Model

4.2 Mid-Processing Engine Switch

4.3 Tab Backgrounding

5. Storage & Caching (Production-Grade)

5.1 Cache Invalidation Strategy

5.2 Storage Quota Exceeded

5.3 Cache Result Storage

6. Advanced Features (Post-MVP Roadmap)

6.1 Multi-Language Support

6.2 Preprocessing Pipeline

6.3 Batch/Multi-Page Documents

6.4 Telemetry & Observability

📝 Decision Template

Quick Reference: MVP vs Production

📋 MVP Decisions Summary (Ready to Implement)

Core Architecture

Interface Contract

Implementation Checklist

Validation Criteria

Next Steps

Changes Log

2026-02-06: Translated Image Write-Back Quality Pipeline

2026-01-17: eSearch-OCR Model Caching

2026-01-16: eSearch-OCR Model Hosting Strategy

2026-01-15: All MVP Blockers Resolved