Architecture & Design¶
This section provides a comprehensive overview of the system architecture, application structure, and key design decisions for the Fluidsdata Digitization and OCR application.
Architecture Contents:
Overview¶
- Frontend Layer (App Structure):
Streamlit multi-page application
Session management for state persistence
Real-time processing feedback
- Processing Layer (System Architecture):
Document processing pipeline
OCR and table reconstruction using DocTR and YOLO models
Data mapping and merging capabilities
- Backend Layer (Azure Backend):
Custom Azure-based API for data handling
Integration with Azure Blob Storage for file management
Database interactions for structured data storage
- State Management (Session Management):
AppSession for processing workflows
DigitizationSession for analysis workflows
Persistent state across user interactions
Design Principles (somewhat adhered to)¶
Modularity: Each component is designed to be independent and reusable.
Separation of Concerns: Clear distinction between UI, processing, and data layers.
Industry Focus: Tailored for oil & gas PVT report digitization.
Note
The existence of two state management classes (AppSession and DigitizationSession) is a relic of earlier development where two separate apps (OCR and Digitization) were merged together. This should be refactored in the future to simplify state management and improve code clarity.