Tesseract OCR

Content Type: Module

Categories: Tracing

Overview

The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library. It supports multiple languages with higher accuracy.

Documentation

The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library.
It supports multiple languages with higher accuracy.

Prequisites :
Community commons

Implementation:
1) Use 'SUB_GetBase64Image' to convert image to base64 where base64 java action is used.
2) Select Load Language and Initialize Language in 'JS_TesseractOCR' javascript action

Result :

Fetch text from image as string type

Supported File:
1. JPG
2. PNG
3. GIF
4. PNM
5. TIFF

Features :
1) It supports multiple languages. Please check HERE for supported languages.
2) The accuracy is pretty high with normal fonts and clear background

Limitation:
Accuracy will be low with noisy backgrounds and custom scripted fonts.

Releases

Version: 1.0.0

Framework Version: 8.18.22

Release Notes: The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library. It supports multiple languages with higher accuracy.