Tesseract OCR
Overview
The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library. It supports multiple languages with higher accuracy.
Documentation
The Tesseract OCR is used to extract text from image using tesseract.js library which is an open source library.
It supports multiple languages with higher accuracy.
Prequisites :
Community commons
Implementation:
1) Use 'SUB_GetBase64Image' to convert image to base64 where base64 java action is used.
2) Select Load Language and Initialize Language in 'JS_TesseractOCR' javascript action
Result :
Fetch text from image as string type
Supported File:
1. JPG
2. PNG
3. GIF
4. PNM
5. TIFF
Features :
1) It supports multiple languages. Please check HERE for supported languages.
2) The accuracy is pretty high with normal fonts and clear background
Limitation:
Accuracy will be low with noisy backgrounds and custom scripted fonts.