Overview
=================

What the Application Does
--------------------------------

This app provides a complete pipeline for digitizing and processing PVT reports:

* **PDF/Image/Excel Import**: Upload report files in various formats.
* **OCR Processing**: Extract textual data from documents using Optical Character Recognition (OCR).
* **Table Reconstruction**: Identify and reconstruct tables from the extracted data.
* **Data Mapping**: Map the extracted data to standard formats for analysis.
* **Data Merging**: Combine data from multiple sources into a unified format for analysis.

Key Features
--------------------------------

* Automated document processing pipeline
* Advanced table detection and reconstruction
* Custom categorization for PVT report data
* Bulk processing capabilities
* API integration for seamless data handling

High-Level Workflow
--------------------------------

.. image:: ../../../media/
    :alt: High-Level Workflow Diagram
    :align: center


.. code-block:: text

    PDF Reports → Digitization Pipeline → Structured Data → Analysis Tools → Export
         ↓              ↓                      ↓                ↓              ↓
    [Input]      [OCR + Structure]      [Database]      [Streamlit UI]   [Output]

Technology Stack
----------------

* **Frontend**: Streamlit (multi-page application)
* **OCR Engine**: DocTR (Document Text Recognition)
* **Computer Vision**: YOLO models for structure detection
* **Backend**: Custom Azure-based API
* **Data Processing**: Pandas, NumPy
* **Testing**: Pytest

Next Steps
----------

* :doc:`installation` - Set up your development environment
* :doc:`quickstart` - Guided walkthrough of the application features
* :doc:`/architecture/index` - Detailed system architecture overview
* :doc:`/workflows/index` - Deep dive into the digitization process