Environment Configuration
Overview
Multimodal Video RAG follows the 12-Factor App methodology for configuration. All system behavior is controlled via environment variables, allowing for seamless transitions between local development, Docker containers, and cloud environments.
Core LLM Configuration
The system is designed to be provider-agnostic. You can choose between running models locally for maximum privacy or using cloud providers for higher performance.
LLM Provider Selection
Set the LLM_PROVIDER variable to determine which backend the LangGraph agent and vision modules use.
| Provider | Value | Description |
| :--- | :--- | :--- |
| Ollama | ollama | (Default) Local inference. Requires an Ollama instance with GPU access. |
| OpenRouter | openrouter | Cloud-based inference. Provides access to high-end models (Llama 3.1 405B, Claude 3.5, etc.). |
Provider-Specific Settings
Ollama (Local)
If using Ollama, ensure the service is reachable from the backend container.
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama:11434
Note: You must manually pull the required models inside the Ollama container as shown in the Quick Start guide.
OpenRouter (Cloud)
If using OpenRouter, an API key is required.
LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-api-key-here
Backend Infrastructure
These settings configure the FastAPI server and its connection to backing services like the vector database.
| Variable | Default | Description |
| :--- | :--- | :--- |
| APP_ENV | development | The environment mode (development, staging, production). |
| CHROMA_HOST | chromadb | The hostname for the ChromaDB vector store. |
| CHROMA_PORT | 8000 | The port for ChromaDB communication. |
| LOG_LEVEL | info | Logging verbosity (debug, info, warning, error). |
Ingestion & Processing
The ingestion pipeline uses specialized models for vision and speech-to-text.
ASR (Speech Recognition)
The system uses Faster-Whisper for high-speed transcription.
- Device Context: By default, the system attempts to use
cuda. If no GPU is available, it will fall back tocpu(significantly slower).
Vision LLM
The vision module generates descriptions for video frames.
- Local Model: Defaulted to
llava:7b. - Visual Sampling: Controls how frequently frames are extracted for analysis.
Privacy & Security
To ensure data privacy during processing (especially when using cloud providers), the system integrates Microsoft Presidio for PII (Personally Identifiable Information) detection.
# Enable/Disable PII Anonymization
PII_DETECTION_ENABLED=true
# Entities to redact (e.g., PHONE_NUMBER, EMAIL_ADDRESS, PERSON)
PII_ENTITIES_TO_REDACT=["PHONE_NUMBER", "EMAIL_ADDRESS", "LOCATION"]
Environment Template (.env)
Create a .env file in the project root. You can use the template below for a standard local setup:
# --- Application Settings ---
APP_NAME="Multimodal Video RAG"
APP_ENV=development
DEBUG=true
# --- LLM Provider ---
# Options: ollama, openrouter
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OPENROUTER_API_KEY=
# --- Vector Store ---
CHROMA_HOST=localhost
CHROMA_PORT=8000
# --- Privacy ---
PII_DETECTION_ENABLED=true
# --- Frontend ---
NEXT_PUBLIC_API_URL=http://localhost:8000
Troubleshooting Configuration
- Connection Refused: If the backend cannot connect to Ollama or ChromaDB while running in Docker, ensure you are using the service name (e.g.,
http://ollama:11434) instead oflocalhost. - Model Not Found: If using Ollama, ensure you have executed
ollama pullfor both the vision model (llava) and the instruct model (llama3.1). - GPU Not Detected: Ensure the
nvidia-container-toolkitis installed on your host machine and that thedeploy.resources.reservations.devicessection is present in yourdocker-composefile.