Installation¶
Minimum Requirements:
Python 3.10 or higher (3.12 recommended)
8GB RAM (16GB+ recommended for processing large PDFs)
10GB free disk space
Windows 10/11, macOS 10.15+, or Ubuntu 20.04+
GPU Requirements (Optional but Recommended):
NVIDIA GPU with CUDA support for faster vision model inference (list of supported GPUs: https://developer.nvidia.com/cuda-gpus)
CUDA 11.0 or higher
Or
Apple M1 chip or later for accelerated processing on macOS
Prerequisites¶
Before installation, ensure you have:
Python 3.10+ installed
Git for cloning the repository
Access to the Azure backend
(Optional) CUDA toolkit for GPU acceleration
Step 1: Clone the Repository¶
git clone [https://fluidsdata@dev.azure.com/fluidsdata/FluidsData/_git/fluidsdata.ocr]
cd fluidsdata.ocr
Step 2: Create Virtual Environment¶
Windows:
python -m venv venv
venv\Scripts\activate
macOS/Linux:
python3 -m venv venv
source venv/bin/activate
Step 3: Install Dependencies¶
Basic Installation:
pip install -r requirements.txt
Development Installation (includes testing and documentation tools):
pip install -r requirements-dev.txt
External Dependencies:
LibreOffice: Required for Excel file processing. Download from https://www.libreoffice.org/download/download/.
Microsoft C/C++ Build Tools (Windows): Required for compiling some Python packages. Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/.
GPU Support (if applicable):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Note
Replace cu118 with the appropriate version for your CUDA installation. See the PyTorch installation guide for more details: https://pytorch.org/get-started/locally/.
You can verify your CUDA installation by running the following command in your system terminal:
nvcc --version
Step 4: Launch the Application¶
streamlit run main.py
VSCode Launch Configuration (optional):
{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
},
{
"name": "Python: main",
"type": "debugpy",
"request": "launch",
"module": "streamlit",
"env": {
"STREAMLIT_APP": "app.py",
"STREAMLIT_ENV": "development",
"AWS_PROFILE": "mega_root",
"PYTHONPATH": "${workspaceRoot}/src",
},
"args": [
"run",
"app/main.py"
],
},
]
}
Next Steps¶
Overview - Understand the application features and workflow
Usage Tutorial - Rundown of the application’s capabilities