Docker & Infrastructure
Docker & Infrastructure
The Multimodal Video RAG system is built on a containerized architecture to ensure environment parity and simplify the orchestration of its heterogeneous components (GPU-accelerated ASR, Vector DB, and Vision LLMs).
Infrastructure Overview
The system utilizes Docker Compose to manage five primary services:
- Backend: FastAPI server handling the LangGraph orchestration, ingestion logic, and PII anonymization.
- Frontend: Next.js 15 application providing the dashboard, search interface, and real-time ingestion monitoring.
- ChromaDB: Vector database for storing and querying multimodal embeddings.
- Ollama: Local inference server for running vision models (LLaVA) and LLMs (Llama 3.1).
- Faster-Whisper: Dedicated worker for CUDA-accelerated audio transcription.
Prerequisites
To run the local inference stack, the host machine requires:
- Docker & Docker Compose
- NVIDIA GPU with at least 8GB VRAM (for concurrent Vision and ASR tasks).
- NVIDIA Container Toolkit installed and configured as the default Docker runtime.
Deployment Profiles
The project provides optimized configurations for different environments via specific compose files.
Development Environment
Used for local testing and feature development. It includes hot-reloading for the frontend and backend.
docker compose -f docker/docker-compose.dev.yml up -d
Production Environment
Optimized for performance, using pre-built images and stripped-down dependencies.
docker compose -f docker/docker-compose.yml up -d
Volume Management & Persistence
The system manages data across three persistent volumes to ensure that indexed videos and embeddings survive container restarts:
| Volume Name | Mount Point | Purpose |
| :--- | :--- | :--- |
| chroma_data | /index/data | Stores the vector database collections and metadata. |
| video_storage | /app/data/videos | Stores processed video files, extracted frames, and thumbnails. |
| ollama_models | /root/.ollama | Persists downloaded LLM and Vision models (e.g., LLaVA, Llama). |
Configuration (Environment Variables)
Infrastructure behavior is controlled via a .env file located in the project root. Key variables include:
# Inference Steering
LLM_PROVIDER=ollama # Options: 'ollama' or 'openrouter'
OPENROUTER_API_KEY= # Required if provider is openrouter
# Resource Constraints
GPU_IDS=all # Specify which GPUs to expose to containers
MAX_CONCURRENT_INGESTIONS=2 # Limits parallel video processing to prevent OOM
Initializing Inference Models
After the containers are healthy, you must pull the required weights into the Ollama service. These models are stored in the ollama_models volume.
# Pull the Vision model for frame description
docker exec ollama ollama pull llava:7b
# Pull the Instruct model for the RAG agent
docker exec ollama ollama pull llama3.1:8b-instruct-q4_0
Health Monitoring
The Backend service exposes standard health check endpoints used by Docker to manage container lifecycle:
GET /health: Basic connectivity and application status.GET /ready: Verification of database connections and model availability.
Network Topology
Services communicate over an internal bridge network:
- Backend: Port
8000(Internal/External) - Frontend: Port
3000(Internal/External) - ChromaDB: Port
8000(Internal Only) - Ollama: Port
11434(Internal Only)