Architecture & Design

This section provides a comprehensive overview of the system architecture, application structure, and key design decisions for the Fluidsdata Digitization and OCR application.

Overview

  1. Frontend Layer (App Structure):
    • Streamlit multi-page application

    • Session management for state persistence

    • Real-time processing feedback

  2. Processing Layer (System Architecture):
    • Document processing pipeline

    • OCR and table reconstruction using DocTR and YOLO models

    • Data mapping and merging capabilities

  3. Backend Layer (Azure Backend):
    • Custom Azure-based API for data handling

    • Integration with Azure Blob Storage for file management

    • Database interactions for structured data storage

  4. State Management (Session Management):
    • AppSession for processing workflows

    • DigitizationSession for analysis workflows

    • Persistent state across user interactions

Design Principles (somewhat adhered to)

  • Modularity: Each component is designed to be independent and reusable.

  • Separation of Concerns: Clear distinction between UI, processing, and data layers.

  • Industry Focus: Tailored for oil & gas PVT report digitization.

Note

The existence of two state management classes (AppSession and DigitizationSession) is a relic of earlier development where two separate apps (OCR and Digitization) were merged together. This should be refactored in the future to simplify state management and improve code clarity.