Overview ================= What the Application Does -------------------------------- This app provides a complete pipeline for digitizing and processing PVT reports: * **PDF/Image/Excel Import**: Upload report files in various formats. * **OCR Processing**: Extract textual data from documents using Optical Character Recognition (OCR). * **Table Reconstruction**: Identify and reconstruct tables from the extracted data. * **Data Mapping**: Map the extracted data to standard formats for analysis. * **Data Merging**: Combine data from multiple sources into a unified format for analysis. Key Features -------------------------------- * Automated document processing pipeline * Advanced table detection and reconstruction * Custom categorization for PVT report data * Bulk processing capabilities * API integration for seamless data handling High-Level Workflow -------------------------------- .. image:: ../../../media/ :alt: High-Level Workflow Diagram :align: center .. code-block:: text PDF Reports → Digitization Pipeline → Structured Data → Analysis Tools → Export ↓ ↓ ↓ ↓ ↓ [Input] [OCR + Structure] [Database] [Streamlit UI] [Output] Technology Stack ---------------- * **Frontend**: Streamlit (multi-page application) * **OCR Engine**: DocTR (Document Text Recognition) * **Computer Vision**: YOLO models for structure detection * **Backend**: Custom Azure-based API * **Data Processing**: Pandas, NumPy * **Testing**: Pytest Next Steps ---------- * :doc:`installation` - Set up your development environment * :doc:`quickstart` - Guided walkthrough of the application features * :doc:`/architecture/index` - Detailed system architecture overview * :doc:`/workflows/index` - Deep dive into the digitization process