CSV Splitter

Content Type: Module
Categories: Utility

Overview

The CSV Splitter for Mendix module simplifies handling large CSV files by splitting them into manageable chunks, enhancing performance and manageability in Mendix applications. Users can define the number of rows per split, ensuring smooth processing of extensive datasets. Each chunk is stored as a FileDocument, facilitating easy retrieval while reducing memory usage.

This module also includes advanced functionality with the Excel Splitter, designed to handle large Excel files efficiently. It supports splitting by rows or threads based on user-defined parameters:

  • Row-Based Splitting: Divides the Excel file into files containing a fixed number of rows.
  • Thread-Based Splitting: Splits the file into a specified number of chunks, distributing rows evenly across them.

The Excel Splitter preserves headers (if enabled) and outputs chunks in .xlsx format. The feature logs processing times, ensuring transparency and performance tracking.

Key Features:

  • Splitting CSV and Excel files into manageable pieces.
  • Flexible splitting methods: by rows or threads.
  • Efficient memory management using FileDocument storage.
  • Supports headers in output files.
  • Comprehensive error handling and logging for robust operation.

Additional Notes:

The implementation includes safeguards to preserve custom user code between regeneration cycles in Mendix Studio Pro. It adheres to modern coding practices, supporting special characters in comments and leveraging enhanced Java functionalities for improved maintainability.

Documentation

The CSV and Excel Splitter Module is a utility designed for Mendix applications to handle large datasets efficiently. It allows users to split CSV and Excel files into smaller, manageable chunks, simplifying data processing and improving performance. The module supports various splitting methods, retains headers, and saves output files as FileDocument objects for easy management.

Features

The module includes the following core functionalities:

  1. CSV Splitting

    • Splits large CSV files into smaller chunks based on a user-defined number of rows per chunk.
    • Each chunk is stored as a FileDocument for streamlined retrieval and reduced memory overhead.
  2. Excel Splitting

    • Supports splitting Excel files in two ways:
      • Row-Based Splitting: Divides the file into smaller files containing a fixed number of rows.
      • Thread-Based Splitting: Distributes rows evenly into a specified number of chunks.
    • Output files retain the header row if enabled by the user.
    • Chunks are generated in .xlsx format for compatibility with modern spreadsheet tools.
  3. Error Handling

    • Validates input parameters such as file content, split quantity, and method to ensure correct operation.
    • Logs detailed error messages for troubleshooting issues.
  4. Performance Monitoring

    • Logs processing details, including chunk generation times, to facilitate performance tracking.

Installation

  1. Import the module into your Mendix project from the Mendix App Store or as a downloaded package.
  2. Add the required dependencies for FileDocument and system proxies if not already available in your project.
  3. Configure the required user roles and permissions to ensure access to the splitter functionalities.

Usage Instructions

CSV Splitting

  1. Provide the CSV file as a FileDocument in Mendix.
  2. Specify the number of rows per chunk.
  3. Call the splitter logic, which outputs individual chunks as separate FileDocument objects.

Excel Splitting

  1. Upload an Excel file as a FileDocument in Mendix.
  2. Choose the splitting method:
    • Row-Based Splitting: Specify the number of rows per chunk.
    • Thread-Based Splitting: Define the number of chunks the file should be split into.
  3. Enable or disable header retention based on your requirements.
  4. Call the splitting logic, which returns the resulting chunks as .xlsx files saved in Mendix.

 

Releases

Version: 3.0.0
Framework Version: 10.0.0
Release Notes: Add Count excel file
Version: 2.0.4
Framework Version: 10.0.0
Release Notes: Change Log error to trace
Version: 2.0.3
Framework Version: 10.0.0
Release Notes: *Fix blank data
Version: 2.0.2
Framework Version: 10.0.0
Release Notes: *Fix get a STRING value from a NUMERIC cell
Version: 2.0.1
Framework Version: 10.0.0
Release Notes: Corrected the logic for reading the Excel file row by row to handle empty or malformed rows properly.
Version: 2.0.0
Framework Version: 10.0.0
Release Notes: The initial release of the CSV and Excel Splitter module introduces powerful tools for handling large datasets in Mendix applications. This version includes functionality to split CSV files into smaller, manageable chunks by specifying the number of rows per split. Each resulting chunk is automatically saved as a FileDocument, simplifying data retrieval and reducing memory usage. Additionally, the module offers advanced functionality for splitting Excel files. Users can choose between row-based splitting, where each file contains a fixed number of rows, or thread-based splitting, which divides the data evenly into a specified number of chunks. Headers can be retained in the output files, and all chunks are saved in .xlsx format, ensuring compatibility with modern Excel tools. The module features robust performance monitoring, including detailed logging of processing times and file generation events. It also incorporates extensive error handling, validating input parameters to prevent common issues and offering detailed logs to simplify troubleshooting. This release adheres to Mendix Studio Pro’s code generation rules, ensuring custom user code is preserved during project regeneration. The implementation supports special characters in comments for enhanced readability and usability. Optimized file handling mechanisms reduce memory usage during processing, making it suitable for large datasets. Known issues in this release include longer processing times for extremely large files, which may depend on server resources, and file size limits that align with the Mendix platform’s storage constraints. Future updates will aim to expand the module's functionality by adding support for additional file formats, introducing content-based splitting methods, and improving error messages for a better user experience.
Version: 1.0.0
Framework Version: 10.0.0
Release Notes: ### CSV Splitter for Mendix - v1.0.0 Release Notes Version: v1.0.0 This first version of the **CSV Splitter for Mendix** introduces core functionality for efficiently handling large CSV files by splitting them into smaller, specified chunks. It offers flexible row-based splitting, header management, and seamless integration with Mendix's FileDocument storage, making it a valuable tool for applications that process extensive datasets. **Functional Features** - *Dynamic CSV Splitting*: Reads and splits large CSV files line by line, without requiring a predefined total row count, optimizing memory efficiency. - *Configurable Row Limit*: Allows users to specify the number of rows per split file (rowsPerFile), ensuring flexibility and control over the chunk sizes. - *Integrated FileDocument Storage*: Each chunk is saved as a Mendix FileDocument, facilitating straightforward management, retrieval, and integration within the Mendix environment. **Header Management** - *Header Inclusion in First File*: By default, if the CSV contains a header, it is included only in the first split file. Additional customization may be added if headers are required across multiple files. **Known Limitations** - *Header Handling*: The header only appears in the first file. If header duplication across chunks is needed, this can be customized in the code. - *File Size Considerations*: For very large files, consider adjusting the rowsPerFile setting to manage memory usage effectively. **Planned Enhancements** - Improved header management to support custom duplication settings. - Potential for adding automated file naming options for easier sorting.