Installation
=================

**Minimum Requirements:**

* Python 3.10 or higher (3.12 recommended)
* 8GB RAM (16GB+ recommended for processing large PDFs)
* 10GB free disk space
* Windows 10/11, macOS 10.15+, or Ubuntu 20.04+

**GPU Requirements (Optional but Recommended):**

* NVIDIA GPU with CUDA support for faster vision model inference (list of supported GPUs: https://developer.nvidia.com/cuda-gpus)
* CUDA 11.0 or higher

Or

* Apple M1 chip or later for accelerated processing on macOS

Prerequisites
-------------

Before installation, ensure you have:

1. Python 3.10+ installed
2. Git for cloning the repository
3. Access to the Azure backend
4. (Optional) CUDA toolkit for GPU acceleration

Step 1: Clone the Repository
----------------------------

.. code-block:: bash

   git clone [https://fluidsdata@dev.azure.com/fluidsdata/FluidsData/_git/fluidsdata.ocr]
   cd fluidsdata.ocr

Step 2: Create Virtual Environment
----------------------------------

**Windows:**

.. code-block:: bash

   python -m venv venv
   venv\Scripts\activate

**macOS/Linux:**

.. code-block:: bash

   python3 -m venv venv
   source venv/bin/activate

Step 3: Install Dependencies
----------------------------

**Basic Installation:**

.. code-block:: bash

   pip install -r requirements.txt

**Development Installation (includes testing and documentation tools):**

.. code-block:: bash

   pip install -r requirements-dev.txt

**External Dependencies:**

* **LibreOffice**: Required for Excel file processing. Download from https://www.libreoffice.org/download/download/.
* **Microsoft C/C++ Build Tools** (Windows): Required for compiling some Python packages. Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/.

**GPU Support (if applicable):**

.. code-block:: bash

   pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

.. note::

   Replace `cu118` with the appropriate version for your CUDA installation. See the PyTorch installation guide for more details: https://pytorch.org/get-started/locally/.
   
You can verify your CUDA installation by running the following command in your system terminal:

.. code-block:: bash

   nvcc --version

Step 4: Launch the Application
---------------------------------

.. code-block:: bash

   streamlit run main.py

VSCode Launch Configuration (optional):

.. code-block:: json

   {
      "version": "0.2.0",
      "configurations": [
         {
               "name": "Python Debugger: Current File",
               "type": "debugpy",
               "request": "launch",
               "program": "${file}",
               "console": "integratedTerminal"
         },
         {
               "name": "Python: main",
               "type": "debugpy",
               "request": "launch",
               "module": "streamlit",
               "env": {
                  "STREAMLIT_APP": "app.py",
                  "STREAMLIT_ENV": "development",
                  "AWS_PROFILE": "mega_root",
                  "PYTHONPATH": "${workspaceRoot}/src",
               },
               "args": [
                  "run",
                  "app/main.py"
               ],
         },
      ]
   }

Next Steps
----------

* :doc:`overview` - Understand the application features and workflow
* :doc:`quickstart` - Rundown of the application's capabilities