Usage Tutorial¶

This tutorial will guide you through the basic application features from processing a PVT report to mapping and saving the data.

Before You Begin¶

Ensure you have:

Completed the steps in Installation
The application running (streamlit run main.py)
A sample report ready for processing

Accessing the Application¶

Open your browser to http://localhost:8501
You should see the Streamlit application interface with a sidebar and the Fluidsdata logo
Login with your Fluidsdata credentials

Navigation¶

Dashboard: WIP page, currently displays an overview of application memory usage. In the future it should be used for application statistics and user management.
File Upload: Opens the navigation box for file upload pages, defaulting to the File Selection page where image and PDF files can be uploaded and digitized.
Process Reports: Initiates the report processing workflow, allowing users to select files for analysis.
Manage Configuration: Opens the configuration screen where users can view all current report configurations and data mappings.
Test Data: The test data page allows users to merge, validate, and upload test data to the API.
Select Tenant: Allows users to select a tenant to work under. Tenants are simple directories for organizing reports and data. For development work, you can use Digitization Dev.
Select Folder: Folders are subdirectories within a tenant, allowing for further organization.

File Upload and OCR¶

The File Upload page is the starting point for uploading and digitizing reports.

The navigation box will look like this:

File Selection: This is the default page where you can upload and process PDF and image files.

Excel File Selection: This page allows you to upload Excel files for processing.

Process File: Currently deprecated, this page can be used for debugging OCR results if needed.

Manual Selection: Entirely deprecated, its functionality has been integrated into the mapping process.

File Selection Page¶

The File Selection page allows you to not only upload files but also run them through the OCR process.

File Upload: This button opens a file dialog to select files for upload. You can select multiple files at once. Supported formats include PDF, PNG, JPG, and JPEG.
Any files uploaded will be automatically saved to the current tenant and folder in Azure Blob Storage, as well as the application’s local cache.
Grid Mode: Not fully implemented, the idea behind this mode is to allow users to view and select files in a grid layout rather than a list.
Uploaded File List: Displays all uploaded files in the current tenant and folder that have yet to be processed. You can select one or more files from this list for processing.
- Detect Tables: This toggle enables or disables the table detection step of the OCR process. When enabled, the application will attempt to identify and extract tables from the uploaded files, and only store the table data. This should usually be enabled.
- Process Selected Files: This button initiates the OCR process on all selected files in the adjacent list. The application will process the files sequentially and save the results behind the scenes.
- Delete Selected Files: This button allows you to delete the selected files from the current tenant and folder. Be cautious, as this action cannot be undone, and will remove the files from both Azure Blob Storage and the local cache.

Processing a report will open a progress bar and log window to show the status of the operation:

This window will display the progress of the OCR process, including any errors or warnings encountered during processing.

Warning

The processing time may vary drastically depending on file/batch size and the performance of the host machine. It is highly recommended to run the application on a machine with GPU acceleration for processing large files or batches of files (see Installation for details).

Warning

Processing large files can be memory-intensive and may cause the application to freeze if the host machine does not have sufficient RAM. 16GB or more should ensure smooth processing.

Warning

The current web host is not optimized for large file processing, and may result in timeouts or crashes if the files are too large or numerous. Even if it doesn’t crash, processing will be very slow. Ideally, the cloud compute should be upgraded in the near future.

Once the processing is complete, the work on this page is done. You can find the processed files at the bottom of the page in the Processed Files section, where you can delete or move them for reprocessing if needed.