Installation

Minimum Requirements:

  • Python 3.10 or higher (3.12 recommended)

  • 8GB RAM (16GB+ recommended for processing large PDFs)

  • 10GB free disk space

  • Windows 10/11, macOS 10.15+, or Ubuntu 20.04+

GPU Requirements (Optional but Recommended):

Or

  • Apple M1 chip or later for accelerated processing on macOS

Prerequisites

Before installation, ensure you have:

  1. Python 3.10+ installed

  2. Git for cloning the repository

  3. Access to the Azure backend

  4. (Optional) CUDA toolkit for GPU acceleration

Step 1: Clone the Repository

git clone [https://fluidsdata@dev.azure.com/fluidsdata/FluidsData/_git/fluidsdata.ocr]
cd fluidsdata.ocr

Step 2: Create Virtual Environment

Windows:

python -m venv venv
venv\Scripts\activate

macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Step 3: Install Dependencies

Basic Installation:

pip install -r requirements.txt

Development Installation (includes testing and documentation tools):

pip install -r requirements-dev.txt

External Dependencies:

GPU Support (if applicable):

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Note

Replace cu118 with the appropriate version for your CUDA installation. See the PyTorch installation guide for more details: https://pytorch.org/get-started/locally/.

You can verify your CUDA installation by running the following command in your system terminal:

nvcc --version

Step 4: Launch the Application

streamlit run main.py

VSCode Launch Configuration (optional):

{
   "version": "0.2.0",
   "configurations": [
      {
            "name": "Python Debugger: Current File",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal"
      },
      {
            "name": "Python: main",
            "type": "debugpy",
            "request": "launch",
            "module": "streamlit",
            "env": {
               "STREAMLIT_APP": "app.py",
               "STREAMLIT_ENV": "development",
               "AWS_PROFILE": "mega_root",
               "PYTHONPATH": "${workspaceRoot}/src",
            },
            "args": [
               "run",
               "app/main.py"
            ],
      },
   ]
}

Next Steps

  • Overview - Understand the application features and workflow

  • Usage Tutorial - Rundown of the application’s capabilities