OCR Extraction

Content Type: Module
Categories: Utility,Data

Overview

This component provides a ready-to-use OCR (Optical Character Recognition) solution for Mendix applications using the OCR.Space API. It allows applications to extract readable text from uploaded images in a reliable and configurable manner.

The module accepts images as Base64 input and performs automatic preprocessing to improve recognition quality. It includes blur detection to identify low-quality images before OCR execution and returns the extracted text along with a calculated confidence score to indicate result quality.

The component is designed for multi-user and multi-environment usage. No API key is bundled with the module. Users must supply their own OCR.Space API key through Mendix App Settings, enabling secure per-environment configuration (Local, Test, Acceptance, Production).

This module is suitable for use cases such as document digitization, ID and invoice scanning, form processing, and image-based data extraction. It is optimized for English-language text and works best with clear, well-aligned images using standard fonts.

The solution is implemented using a Java action and can be easily integrated into existing Mendix microflows without additional dependencies

Documentation

Typical usage scenario

This module allows Mendix applications to extract readable text from images using the OCR.Space API.

Typical use cases include:

  • Uploading ID cards, invoices, bills, receipts, certificates, or forms
  • Extracting text from scanned documents or photos
  • Automatically converting image content into searchable and editable text
  • Showing extracted text and confidence score in a popup or page
  • Using OCR results for validation, automation, or further processing

This solution is useful in:

  • Document digitization systems
  • Invoice or receipt processing apps
  • Any Mendix app requiring image-to-text conversion

Features and limitations

Features

  • Extracts text from images using OCR.Space API
  • Supports Base64 image input
  • Automatic image preprocessing (resize, format handling)
  • Returns:
    • Extracted text
    • Estimated OCR confidence score (%)
    • Confidence level (Low / Medium / High)
  • Designed for multi-user Mendix applications
  • API key is configurable per environment
  • Clean Java Action implementation

Limitations

  • OCR accuracy depends on:
    • Image quality
    • Lighting
    • Text clarity and font
  • Confidence score is an estimated quality indicator, not a mathematically exact accuracy
  • Java ImageIO does not support WEBP format by default
  • Requires an active internet connection (cloud OCR)

Dependencies

  • OCR.Space API key (free or paid)
  • Mendix Studio Pro 9.x or above
  • Java Runtime Environment (default Mendix runtime)
  • Internet access from Mendix runtime
No third-party Mendix modules are required.

Installation

  1. Download the module from the Mendix Marketplace
  2. Import the module into your Mendix project
  3. Resolve any consistency errors
  4. Add the module to your project dependencies
  5. Deploy the application once to initialize the module

Configuration

  1. Obtain an API key from OCR.Space
  2. Store the API key in:
    • Setting of App Constant or
    • Environment variable (recommended for production)
  3. Configure image upload entity to pass:
    • File document
    • Base64 image data
  4. Call the provided Java Action from:
    • Microflow
    • Button click
    • Popup action
  5. Map the output:
    • Extracted text
    • Confidence percentage
    • Confidence level

Known bugs

  • WEBP image format is not supported by default Java ImageIO
  • Very low-quality images may result in:
    • Low confidence score
    • Partial text extraction
  • Large images may increase response time

 

 

 

 

 

Releases