Architecture & Design ================= This section provides a comprehensive overview of the system architecture, application structure, and key design decisions for the Fluidsdata Digitization and OCR application. .. toctree:: :maxdepth: 2 :caption: Architecture Contents: app_structure upload_pipeline/index Overview ----------------- 1. **Frontend Layer** (:doc:`app_structure`): * Streamlit multi-page application * Session management for state persistence * Real-time processing feedback 2. **Processing Layer** (:doc:`system_architecture`): * Document processing pipeline * OCR and table reconstruction using DocTR and YOLO models * Data mapping and merging capabilities 3. **Backend Layer** (:doc:`azure_backend`): * Custom Azure-based API for data handling * Integration with Azure Blob Storage for file management * Database interactions for structured data storage 4. **State Management** (:doc:`session_management`): * AppSession for processing workflows * DigitizationSession for analysis workflows * Persistent state across user interactions Design Principles (somewhat adhered to) --------------------------------------- * **Modularity**: Each component is designed to be independent and reusable. * **Separation of Concerns**: Clear distinction between UI, processing, and data layers. * **Industry Focus**: Tailored for oil & gas PVT report digitization. .. note:: The existence of two state management classes (AppSession and DigitizationSession) is a relic of earlier development where two separate apps (OCR and Digitization) were merged together. This should be refactored in the future to simplify state management and improve code clarity.