Installation ================= **Minimum Requirements:** * Python 3.10 or higher (3.12 recommended) * 8GB RAM (16GB+ recommended for processing large PDFs) * 10GB free disk space * Windows 10/11, macOS 10.15+, or Ubuntu 20.04+ **GPU Requirements (Optional but Recommended):** * NVIDIA GPU with CUDA support for faster vision model inference (list of supported GPUs: https://developer.nvidia.com/cuda-gpus) * CUDA 11.0 or higher Or * Apple M1 chip or later for accelerated processing on macOS Prerequisites ------------- Before installation, ensure you have: 1. Python 3.10+ installed 2. Git for cloning the repository 3. Access to the Azure backend 4. (Optional) CUDA toolkit for GPU acceleration Step 1: Clone the Repository ---------------------------- .. code-block:: bash git clone [https://fluidsdata@dev.azure.com/fluidsdata/FluidsData/_git/fluidsdata.ocr] cd fluidsdata.ocr Step 2: Create Virtual Environment ---------------------------------- **Windows:** .. code-block:: bash python -m venv venv venv\Scripts\activate **macOS/Linux:** .. code-block:: bash python3 -m venv venv source venv/bin/activate Step 3: Install Dependencies ---------------------------- **Basic Installation:** .. code-block:: bash pip install -r requirements.txt **Development Installation (includes testing and documentation tools):** .. code-block:: bash pip install -r requirements-dev.txt **External Dependencies:** * **LibreOffice**: Required for Excel file processing. Download from https://www.libreoffice.org/download/download/. * **Microsoft C/C++ Build Tools** (Windows): Required for compiling some Python packages. Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/. **GPU Support (if applicable):** .. code-block:: bash pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 .. note:: Replace `cu118` with the appropriate version for your CUDA installation. See the PyTorch installation guide for more details: https://pytorch.org/get-started/locally/. You can verify your CUDA installation by running the following command in your system terminal: .. code-block:: bash nvcc --version Step 4: Launch the Application --------------------------------- .. code-block:: bash streamlit run main.py VSCode Launch Configuration (optional): .. code-block:: json { "version": "0.2.0", "configurations": [ { "name": "Python Debugger: Current File", "type": "debugpy", "request": "launch", "program": "${file}", "console": "integratedTerminal" }, { "name": "Python: main", "type": "debugpy", "request": "launch", "module": "streamlit", "env": { "STREAMLIT_APP": "app.py", "STREAMLIT_ENV": "development", "AWS_PROFILE": "mega_root", "PYTHONPATH": "${workspaceRoot}/src", }, "args": [ "run", "app/main.py" ], }, ] } Next Steps ---------- * :doc:`overview` - Understand the application features and workflow * :doc:`quickstart` - Rundown of the application's capabilities