scribe.js/docs/API.md at master · scribeocr/scribe.js

init
- Parameters
extractText
- Parameters
writeDebugImages
- Parameters
clear
terminate
exportData
- Parameters
download
- Parameters
SortedInputFiles
- Properties
importFiles
- Parameters
recognize
- Parameters

init

Initialize the program and optionally pre-load resources.

Parameters

params Object?
- params.pdf boolean Load PDF renderer. (optional, default false)
- params.ocr boolean Load OCR engine. (optional, default false)
- params.font boolean Load built-in fonts. The PDF renderer and OCR engine are automatically loaded when needed. Therefore, the only reason to set pdf or ocr to true is to pre-load them. (optional, default false)

extractText

Function for extracting text from image and PDF files with a single function call. By default, existing text content is extracted for text-native PDF files; otherwise text is extracted using OCR. To control how text from PDF files is handled, set the options in the opt.usePDFText object. For more control, use init, importFiles, recognize, and exportData separately.

Parameters

files
langs Array<string> (optional, default ['eng'])
outputFormat (optional, default 'txt')
options (optional, default {})

writeDebugImages

Parameters

ctx
compDebugArrArr Array<Array<CompDebugNode>>
filePath string

clear

Clears all document-specific data.

terminate

Terminates the program and releases resources.

exportData

Export active OCR data to specified format.

Parameters

format ("pdf" | "hocr" | "docx" | "xlsx" | "txt" | "text") (optional, default 'txt')
minPage number First page to export. (optional, default 0)
maxPage number Last page to export (inclusive). -1 exports through the last page. (optional, default -1)

Returns Promise<(string | ArrayBuffer)>

download

Runs exportData and saves the result as a download (browser) or local file (Node.js).

Parameters

format ("pdf" | "hocr" | "docx" | "xlsx" | "txt" | "text")
fileName string
minPage number First page to export. (optional, default 0)
maxPage number Last page to export (inclusive). -1 exports through the last page. (optional, default -1)

SortedInputFiles

An object with this shape can be used to provide input to the importFiles function, without needing that function to figure out the file types. This is required when using ArrayBuffer inputs.

Type: Object

Properties

pdfFiles (Array<File> | Array<string> | Array<ArrayBuffer>)?
imageFiles (Array<File> | Array<string> | Array<ArrayBuffer>)?
ocrFiles (Array<File> | Array<string> | Array<ArrayBuffer>)?

importFiles

Import files for processing. An object with pdfFiles, imageFiles, and ocrFiles arrays can be provided to import multiple types of files. Alternatively, for File objects (browser) and file paths (Node.js), a single array can be provided, which is sorted based on extension.

Parameters

files (Array<File> | FileList | Array<string> | SortedInputFiles)

recognize

Recognize all pages in active document. Files for recognition should already be imported using importFiles before calling this function. The results of recognition can be exported by calling exportFiles after this function.

Parameters

options Object (optional, default {})
- options.mode ("speed" | "quality") Recognition mode. (optional, default 'quality')
- options.langs Array<string> Language(s) in document. (optional, default ['eng'])
- options.modeAdv ("lstm" | "legacy" | "combined") Alternative method of setting recognition mode. (optional, default 'combined')
- options.combineMode ("conf" | "data" | "none") Method of combining OCR results. Used if OCR data already exists. (optional, default 'data')
- options.vanillaMode boolean Whether to use the vanilla Tesseract.js model. (optional, default false)
- options.config Object<string, string> Config params to pass to to Tesseract.js. (optional, default {})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table of Contents

init

Parameters

extractText

Parameters

writeDebugImages

Parameters

clear

terminate

exportData

Parameters

download

Parameters

SortedInputFiles

Properties

importFiles

Parameters

recognize

Parameters

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

Table of Contents

init

Parameters

extractText

Parameters

writeDebugImages

Parameters

clear

terminate

exportData

Parameters

download

Parameters

SortedInputFiles

Properties

importFiles

Parameters

recognize

Parameters