Usage Tutorial

This tutorial will guide you through the basic application features from processing a PVT report to mapping and saving the data.

Before You Begin

Ensure you have:

  • Completed the steps in Installation

  • The application running (streamlit run main.py)

  • A sample report ready for processing

Accessing the Application

  1. Open your browser to http://localhost:8501

  2. You should see the Streamlit application interface with a sidebar and the Fluidsdata logo

  3. Login with your Fluidsdata credentials

File Upload and OCR

The File Upload page is the starting point for uploading and digitizing reports.

The navigation box will look like this:

File Upload Navigation

File Selection: This is the default page where you can upload and process PDF and image files.

Excel File Selection: This page allows you to upload Excel files for processing.

Process File: Currently deprecated, this page can be used for debugging OCR results if needed.

Manual Selection: Entirely deprecated, its functionality has been integrated into the mapping process.

File Selection Page

The File Selection page allows you to not only upload files but also run them through the OCR process.

File Upload
  1. File Upload: This button opens a file dialog to select files for upload. You can select multiple files at once. Supported formats include PDF, PNG, JPG, and JPEG.

    Any files uploaded will be automatically saved to the current tenant and folder in Azure Blob Storage, as well as the application’s local cache.

  2. Grid Mode: Not fully implemented, the idea behind this mode is to allow users to view and select files in a grid layout rather than a list.

  3. Uploaded File List: Displays all uploaded files in the current tenant and folder that have yet to be processed. You can select one or more files from this list for processing.

    • Detect Tables: This toggle enables or disables the table detection step of the OCR process. When enabled, the application will attempt to identify and extract tables from the uploaded files, and only store the table data. This should usually be enabled.

    • Process Selected Files: This button initiates the OCR process on all selected files in the adjacent list. The application will process the files sequentially and save the results behind the scenes.

    • Delete Selected Files: This button allows you to delete the selected files from the current tenant and folder. Be cautious, as this action cannot be undone, and will remove the files from both Azure Blob Storage and the local cache.

Processing a report will open a progress bar and log window to show the status of the operation:

File Processing

This window will display the progress of the OCR process, including any errors or warnings encountered during processing.

Warning

The processing time may vary drastically depending on file/batch size and the performance of the host machine. It is highly recommended to run the application on a machine with GPU acceleration for processing large files or batches of files (see Installation for details).

Warning

Processing large files can be memory-intensive and may cause the application to freeze if the host machine does not have sufficient RAM. 16GB or more should ensure smooth processing.

Warning

The current web host is not optimized for large file processing, and may result in timeouts or crashes if the files are too large or numerous. Even if it doesn’t crash, processing will be very slow. Ideally, the cloud compute should be upgraded in the near future.

Once the processing is complete, the work on this page is done. You can find the processed files at the bottom of the page in the Processed Files section, where you can delete or move them for reprocessing if needed.