REST API Reference
REST API Reference
The Multimodal Video RAG API follows RESTful principles and returns JSON-encoded responses. The base URL for all v1 endpoints is /api/v1.
Video Management
Endpoints for ingesting, listing, and managing video content.
Ingest a Video
POST /videos/ingest
Starts an asynchronous pipeline to download, transcribe, and index a video from a URL.
Request Body
| Field | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| url | string | Yes | The YouTube URL of the video to ingest. |
Response (VideoIngestResponse)
{
"job_id": "job_01HXYZ123ABC",
"video_id": "vid_01HXYZ123ABC",
"status": "pending",
"message": "Ingestion job created"
}
List Videos
GET /videos
Returns a list of all videos currently in the knowledge base.
Response (VideoListResponse)
{
"videos": [
{
"video_id": "vid_01HXYZ123ABC",
"title": "Introduction to Multimodal AI",
"duration_seconds": 360,
"source_url": "https://www.youtube.com/watch?v=...",
"thumbnail_url": "https://...",
"ingested_at": "2023-10-27T10:00:00Z",
"chunk_count": 42,
"searchable": true
}
],
"total": 1
}
Delete Video
DELETE /videos/{video_id}
Removes a video and all its associated embeddings (visual and audio) from the vector store.
Search
Endpoints for semantic retrieval across the video corpus.
Semantic Search
POST /search
Searches for specific moments within videos using natural language queries.
Request Body
| Field | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| query | string | Required | The search query (e.g., "blue elephants"). |
| top_k | integer | 5 | Number of results to return. |
| video_ids | string[] | null | Optional list of IDs to restrict search. |
| include_visual | boolean | true | Search across visual scene descriptions. |
| include_audio | boolean | true | Search across audio transcripts. |
Response (SearchResponse)
{
"query": "elephants",
"results": [
{
"video_id": "vid_123",
"chunk_id": "chk_456",
"chunk_type": "visual",
"timestamp_start": 45.5,
"timestamp_end": 50.0,
"content": "A large elephant walks across the savanna.",
"score": 0.89,
"preview_url": "https://..."
}
],
"total_results": 1,
"search_time_ms": 120
}
Chat Assistant
Endpoints for conversational interaction with the video library.
Send Chat Message
POST /chat
Interact with an AI agent that reasons across your video content. This endpoint uses a LangGraph-powered router to decide whether to search the corpus or answer based on previous context.
Request Body
| Field | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| query | string | Yes | The user message or question. |
| video_ids | string[] | No | Target specific videos for the agent to consider. |
| session_id | string | No | ID to maintain conversation history. |
Response (ChatResponse)
{
"answer": "The speaker mentions the project timeline starting at [0:45].",
"sources": [
{
"video_id": "vid_123",
"timestamp": "0:45",
"timestamp_seconds": 45,
"transcript": "We are planning to launch the project in Q4.",
"visual_context": "Speaker pointing to a slide titled 'Timeline'."
}
],
"query": "When does the project start?"
}
Real-time Updates (WebSockets)
For long-running tasks like ingestion, the system provides real-time progress updates via WebSockets.
Ingestion Job Progress
WS /ws/jobs/{job_id}
Message Payload (ProgressEvent)
{
"job_id": "job_123",
"video_id": "vid_123",
"stage": "transcribing",
"stage_number": 3,
"total_stages": 5,
"progress_percent": 60,
"message": "Generating audio transcript...",
"timestamp": "2023-10-27T10:05:00Z"
}
Data Models
JobStatus (Enum)
The status of a background ingestion job can be one of the following:
pending: Job created in queue.downloading: Fetching video file from source.extracting: Splitting audio and frames.transcribing: Running Whisper ASR.segmenting: Determining scene boundaries.describing: Running Vision LLM on frames.embedding: Generating vectors for content.indexing: Saving to ChromaDB.completed: Video is searchable.failed: Job stopped due to an error.