A multimodal insights engine for video understanding
Project description
Atlas - Multimodal Video Understanding
Atlas is an open-source multimodal insights engine for video understanding. Extract rich semantic insights from videos using AI, index them in a local vector store, chat with your video content from the terminal, or use the bundled Web UI served by the Atlas HTTP server.
https://github.com/user-attachments/assets/d28fb343-5c74-462f-996e-f0e5dc51cdf8
Features
- 🎬 Multimodal Analysis: Extract visual cues, interactions, contextual information, audio analysis, and transcripts from videos
- ⚡ Real-time Streaming:
extractandtranscribestream results to the terminal as each segment completes — no waiting for the full video - 🔍 Semantic Search: Index videos and search through content semantically using a local vector store (powered by zvec)
- 💬 Video Chat: Ask questions about indexed videos; context is drawn from the vector store and prior conversation history
- 🌐 Companion Web UI: Browse indexed videos, queue jobs, inspect stats, and use Atlas from a browser at
/ui - 🧾 Persistent Run History:
transcribe,extract, andindexnow persist every run, queued or direct, so outputs and benchmarks remain accessible later - 🤖 Powered by Gemini: Uses Google's Gemini models for multimodal analysis and embeddings
- 🎙️ Groq Whisper Transcription: High-quality fast-video transcription via the
transcribecommand - 💻 CLI First: Clean, ergonomic command-line interface
- 🔒 Local by default: Vector index stored on disk (
~/.atlas/index); your videos never leave your machine
Performance
| Function | Avg / call | Notes |
|---|---|---|
| Gemini multimodal analysis | ~5s | Processing time for a segment with multiple attributes |
| Groq Whisper (transcribe) | ~5s / video | Full video |
| ffmpeg clip | ~0.1s | Per chunk |
| zvec query | ms | Local HNSW, ~8× faster than Pinecone |
For a ~5 min video with 15s chunk_duration (~20 chunks), wall time for indexing is typically ~90s with default concurrency, as chunks are processed in parallel.
Requirements
-
ffmpeg: Required for video clipping.
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt install ffmpeg - Windows:
winget install ffmpeg
- macOS:
-
Platform Support: Linux (x86_64, ARM64) and macOS (ARM64)
Installation
Requirements
- Python 3.12
- ffmpeg (for video processing)
Install from PyPI
pip install atlas-video
Install from Source
git clone https://github.com/nwaughachukwuma/atlas.git
cd atlas
pip install -e ".[dev]"
Docker
Zero-setup option — no Python, no ffmpeg, no dependencies. Just Docker and your API keys.
Pull the image
docker pull nwaughachukwuma/atlas-video
# or pin to a specific version
docker pull nwaughachukwuma/atlas-video:0.1.0
Quick one-liner usage
All configuration is passed via -e flags — fully 12-factor compliant.
# Transcribe (Groq Whisper | Uses a Task Queue)
docker run --rm -it \
-e GROQ_API_KEY="$GROQ_API_KEY" \
-v "$(pwd)/videos:/data" \
nwaughachukwuma/atlas-video transcribe /data/video.mp4
# Transcribe (streams to terminal)
docker run --rm -it \
-e GROQ_API_KEY="$GROQ_API_KEY" \
-v "$(pwd)/videos:/data" \
nwaughachukwuma/atlas-video transcribe /data/video.mp4 --no-queue
# Extract insights
docker run --rm -it \
-e GEMINI_API_KEY="$GEMINI_API_KEY" \
-v "$(pwd)/videos:/data" \
nwaughachukwuma/atlas-video extract /data/video.mp4
# Index a video (persist the vector store with a named volume)
docker run --rm -it \
-e GEMINI_API_KEY="$GEMINI_API_KEY" \
-v atlas-data:/home/atlas/.atlas \
-v "$(pwd)/videos:/data" \
nwaughachukwuma/atlas-video index /data/video.mp4 --benchmark
# Semantic search
docker run --rm -it \
-e GEMINI_API_KEY="$GEMINI_API_KEY" \
nwaughachukwuma/atlas-video search "machine learning demo"
# list all indexed videos
docker run --rm -it \
nwaughachukwuma/atlas-video list-videos
# Chat with an indexed video
docker run --rm -it \
-e GEMINI_API_KEY="$GEMINI_API_KEY" \
nwaughachukwuma/atlas-video chat <video_id> "What topics are covered?"
# See all commands
docker run --rm nwaughachukwuma/atlas-video --help
# See version
docker run --rm nwaughachukwuma/atlas-video --version
# list queued tasks
docker run --rm nwaughachukwuma/atlas-video queue list
# list persisted runs
docker run --rm nwaughachukwuma/atlas-video runs list
Run as HTTP server (Docker)
The Docker image now starts the Atlas HTTP server and bundled Web UI by default.
# Start Atlas API server + Web UI on port 8000
docker run -p 8000:8000 -it \
-v atlas-data:/home/atlas/.atlas \
--env-file .env \
nwaughachukwuma/atlas-video
# Open the Web UI
# http://localhost:8000/ui
# Dashboard
# http://localhost:8000/ui/dashboard
# Health check
curl http://localhost:8000/health
If you prefer to pass the command explicitly, serve still works:
docker run --rm -d \
-p 8000:8000 \
--env-file .env \
-v atlas-data:/home/atlas/.atlas \
nwaughachukwuma/atlas-video serve -H 0.0.0.0 -p 8000
Or specify the API keys inline:
docker run --rm -d \
-p 8000:8000 \
-e GEMINI_API_KEY="$GEMINI_API_KEY" \
-e GROQ_API_KEY="$GROQ_API_KEY" \
-v atlas-data:/home/atlas/.atlas \
nwaughachukwuma/atlas-video serve -H 0.0.0.0 -p 8000
Web UI workflow
After pulling the image, the shortest path to the browser experience is:
# 1. Create a .env file with your API keys
cat > .env <<'EOF'
GEMINI_API_KEY=your-key-here
GROQ_API_KEY=your-key-here
ENABLE_LOGGING=false
EOF
# 2. Start Atlas
docker run -p 8000:8000 -it \
-v atlas-data:/home/atlas/.atlas \
--env-file .env \
nwaughachukwuma/atlas-video
# 3. Open the UI
# http://localhost:8000/ui
From the Web UI you can:
- upload videos for
transcribe,extract, andindex - browse indexed videos and open a per-video detail page
- chat with an indexed video
- inspect queue state and task details
- inspect persisted run history, outputs, and benchmarks for both queued and direct runs
- view dashboard health, collection stats, and storage paths
Run history workflow
transcribe, extract, and index now persist their outputs for both queued and --no-queue execution modes.
# inspect recent runs
atlas runs list
# inspect one run's metadata
atlas runs show --run-id <run_id>
# print stored output
atlas runs output --run-id <run_id>
# print stored benchmark, when available
atlas runs benchmark --run-id <run_id>
Queued tasks still use atlas queue list and atlas queue status --task-id <task_id> for lifecycle tracking, but the persisted output and benchmark lookup now lives under atlas runs ... for both queued and direct execution.
Environment variables
| Variable | Required for | Description |
|---|---|---|
GEMINI_API_KEY |
extract, index, search, chat |
Google AI Studio |
GROQ_API_KEY |
transcribe, extract, index |
Groq Console |
ENABLE_LOGGING |
optional | Set to true for verbose logging (default: false) |
Persistent vector store
The container stores its index at /home/atlas/.atlas. Mount a named Docker volume to persist data across runs:
# Create the volume once
docker volume create atlas-data
# All subsequent runs share the same index
docker run --rm -it \
-e GEMINI_API_KEY="$GEMINI_API_KEY" \
-v atlas-data:/home/atlas/.atlas \
-v "$(pwd)/videos:/data" \
nwaughachukwuma/atlas-video index /data/my-video.mp4
Using a .env file
# .env
GEMINI_API_KEY=your-key-here
GROQ_API_KEY=your-key-here
ENABLE_LOGGING=false
docker run --rm -it \
--env-file .env \
-v atlas-data:/home/atlas/.atlas \
-v "$(pwd)/videos:/data" \
nwaughachukwuma/atlas-video extract /data/video.mp4
1. Set up API Keys
export GEMINI_API_KEY=your-gemini-api-key # required for extract, index, search, chat
export GROQ_API_KEY=your-groq-api-key # required only for `atlas transcribe`
export ENABLE_LOGGING=true
- Get a Gemini API key: Google AI Studio
- Get a Groq API key: Groq Console
2. Extract Multimodal Insights (streams in real-time)
atlas extract video.mp4
atlas extract video.mp4 --chunk-duration=15s --overlap=1s --format=json
3. Index a Video
atlas index video.mp4
# Prints a video_id on completion — save it for search and chat
4. Search Indexed Videos
# Search all indexed content
atlas search "people discussing machine learning"
# Restrict to a specific video (video_id as first positional arg)
atlas search abc123def456 "demo of the new feature"
5. Chat with a Video
atlas chat abc123def456 "What tools are demonstrated in this video?"
6. Get All Indexed Data for a Video
atlas get-video abc123def456
atlas get-video abc123def456 --output data.json
7. Transcribe a Video (streams in real-time)
atlas transcribe video.mp4
atlas transcribe video.mp4 --format=srt --output=transcript.srt
CLI Commands
atlas extract
Extract multimodal insights from a video without indexing. Tasks are queued by default; use --no-queue to run directly and stream results to the terminal in real time.
atlas extract VIDEO_PATH [OPTIONS]
Options:
-c, --chunk-duration DUR Duration of each chunk (e.g. 15s, 1m) [default: 15s]
-l, --overlap DUR Overlap between chunks [default: 1s]
-a, --attrs ATTR Attribute to extract; repeat for multiple
-o, --output FILE Save full output to this JSON file
-f, --format FMT Output format: json or text [default: text]
--include-summary BOOL Generate a per-segment summary: true or false (default: true)
--benchmark Print a timing breakdown after completion
Available attributes (--attrs):
| Attribute | Description |
|---|---|
visual_cues |
Visual elements, entities, and their attributes |
interactions |
Movements, gestures, dynamics between entities |
contextual_information |
Production elements, setting, atmosphere |
audio_analysis |
Speech, music, sound effects, ambience |
transcript |
Verbatim spoken content (via Gemini within chunks) |
Note on
transcriptinextract: Within the chunked extract flow, all five attributes — includingtranscript— are handled concurrently by Gemini for maximum throughput. For a high-quality, fast video transcript useatlas transcribe(Powereed by Groq Whisper).
Examples:
# Stream text to terminal, default attrs
atlas extract video.mp4
# JSON output saved to file, custom chunks
atlas extract video.mp4 --chunk-duration=15s --overlap=1s --format=json --output=insights.json
# Only extract visual and audio
atlas extract video.mp4 --attrs visual_cues --attrs audio_analysis
# Disable summary, print benchmark timing
atlas extract video.mp4 --include-summary false --benchmark
atlas index
Index a video for semantic search. Prints a video_id on completion — use it to filter searches, start chats, or retrieve data with get-video. Tasks are queued by default; use --no-queue to run directly.
atlas index VIDEO_PATH [OPTIONS]
Options:
-c, --chunk-duration DUR Duration of each chunk [default: 15s]
-o, --overlap DUR Overlap between chunks [default: 1s]
-e, --embedding-dim N Embedding dimension: 768 or 3072 [default: 768] (Not Implemented)
-a, --attrs ATTR Attribute to extract; repeat for multiple
--include-summary BOOL Generate a per-segment summary: true or false (default: true)
--benchmark Print a timing breakdown after completion
Examples:
atlas index video.mp4
atlas index video.mp4 --chunk-duration=15s --overlap=1s
atlas search
Search all indexed videos semantically. Pass a video ID as the first argument to scope the search to a single video.
atlas search [VIDEO_ID] QUERY [OPTIONS]
Arguments:
VIDEO_ID (optional) Video ID to restrict search to — returned by 'atlas index'
QUERY Natural-language search query
Options:
-k, --top-k N Number of results to return [default: 10]
Examples:
# Search across all indexed videos
atlas search "machine learning demonstration"
# Search within a specific video
atlas search abc123def456 "the login screen"
atlas transcribe
Extract a transcript from a video or audio file using Groq Whisper. Output streams to the terminal in real-time.
atlas transcribe VIDEO_PATH [OPTIONS]
Options:
-f, --format FMT Output format: text, vtt, or srt [default: text]
-o, --output FILE Output file path
Examples:
atlas transcribe video.mp4
atlas transcribe video.mp4 --format=srt --output=transcript.srt
atlas transcribe audio.mp3 --format=vtt
atlas chat
Ask a question about a previously indexed video. Context is assembled from:
- Top-k semantic hits from the video index (multimodal insights)
- Recent chat history from the chat vector store (last 20 messages)
- Top-k semantic hits from prior chat turns (deduped against history)
atlas chat VIDEO_ID QUERY
Arguments:
VIDEO_ID Video ID returned by 'atlas index'
QUERY Your question about the video
Examples:
atlas chat abc123def456 "What is the main topic of this video?"
atlas chat abc123def456 "Who are the people speaking?"
atlas get-video
Retrieve all indexed data for a video, returned in the same shape as the extract command. Useful for inspecting exactly what was stored during indexing.
atlas get-video VIDEO_ID [OPTIONS]
Arguments:
VIDEO_ID Video ID returned by 'atlas index'
Options:
-o, --output FILE Save JSON output to this file (default: print to stdout)
Examples:
atlas get-video abc123def456
atlas get-video abc123def456 --output data.json
atlas list-videos
List all videos that have been indexed in the local vector store.
atlas list-videos
atlas list-chat
Show the chat history for a given video.
atlas list-chat VIDEO_ID [OPTIONS]
Arguments:
VIDEO_ID Video ID to retrieve chat history for
Options:
-n, --last-n N Maximum number of messages to show [default: 20]
atlas stats
Show statistics about the local vector store (collection paths, document counts).
atlas stats
atlas queue
Manage the background task queue. Long-running commands (index, extract) are queued by default; use --no-queue on any command to run immediately.
atlas queue list # list all tasks
atlas queue status --task-id TASK_ID # check status of a specific task
Use --no-queue on any command to bypass the queue and run synchronously:
atlas index video.mp4 --no-queue
atlas serve
Start an HTTP API server that exposes all Atlas commands as REST endpoints. Useful for integrating Atlas into a backend service or running it behind a reverse proxy.
atlas serve [OPTIONS]
Options:
-H, --host HOST Host interface to bind [default: 0.0.0.0]
-p, --port PORT Port to listen on [default: 8000]
--env-file PATH Load environment variables from a .env file before starting
Examples:
# Start with defaults
atlas serve
# Bind to localhost only, custom port
atlas serve -H 127.0.0.1 -p 9000
# Load API keys from a .env file
atlas serve -H 0.0.0.0 -p 8000 --env-file .env
API endpoints:
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /extract |
Extract multimodal insights from a video |
| POST | /index |
Index a video for semantic search |
| POST | /transcribe |
Transcribe a video |
| POST | /search |
Semantic search across indexed videos |
| POST | /chat |
Chat with a video (SSE streaming response) |
| GET | /list-videos |
List all indexed videos |
| GET | /list-chat/{video_id} |
Get chat history for a video |
| GET | /stats |
Vector store statistics |
| GET | /get-video/{video_id} |
Retrieve all indexed data for a video |
| GET | /queue/list |
List queued tasks (filter by ?status=) |
| GET | /queue/status/{task_id} |
Get status and result of a specific task |
/chatreturns a StreamingResponse.
Quick test:
# Health
curl http://localhost:8000/health
# Search
curl -s -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "machine learning demo", "top_k": 5}'
# Streaming chat
curl -sN -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"video_id": "<video_id>", "query": "What is this video about?"}'
API Keys Reference
| Command | GEMINI_API_KEY |
GROQ_API_KEY |
|---|---|---|
transcribe |
❌ Not needed | ✅ Required |
extract |
✅ Required | ✅ Required |
index |
✅ Required | ✅ Required |
search |
✅ Required | ❌ Not needed |
chat |
✅ Required | ❌ Not needed |
get-video |
❌ Not needed | ❌ Not needed |
list-videos |
❌ Not needed | ❌ Not needed |
list-chat |
❌ Not needed | ❌ Not needed |
Python API
import asyncio
from atlas import VideoProcessor, VideoProcessorConfig
async def main():
# Extract insights
config = VideoProcessorConfig(
video_path="video.mp4",
chunk_duration=15,
overlap=1,
description_attrs=["visual_cues", "contextual_information", "audio_analysis", "transcript"],
include_summary=True,
)
async with VideoProcessor(config) as processor:
result = await processor.process()
print(f"Processed {len(result.video_descriptions)} segments")
# Index for search — returns (video_id, indexed_count, result)
from atlas.vector_store import index_video
video_id, indexed_count, _ = await index_video("video.mp4")
print(f"video_id: {video_id} docs: {indexed_count}")
# Search all videos
from atlas.vector_store import search_video
results = await search_video("people discussing AI", top_k=5)
for r in results:
print(f"{r.score:.3f} [{r.video_id}] {r.content[:80]}")
# Search within a specific video
results = await search_video("login screen", top_k=5, video_id=video_id)
# Chat
from atlas.vector_store import chat_with_video
answer = await chat_with_video(video_id, "What tools are shown?")
print(answer)
asyncio.run(main())
Real-time Extract
Pass on_segment to receive results as each segment is processed:
from atlas import VideoProcessor, VideoProcessorConfig
async def realtime_example():
config = VideoProcessorConfig(video_path="video.mp4", chunk_duration=15)
async with VideoProcessor(config) as processor:
result = await processor.process(
on_segment=lambda desc: print(f"{desc.start:.1f}s–{desc.end:.1f}s ready")
)
Transcription
from atlas.video_processor import extract_transcript
import asyncio
# One-shot
transcript = asyncio.run(extract_transcript("video.mp4", format="srt"))
# Real-time callback
async def stream():
await extract_transcript(
"video.mp4",
format="text",
on_chunk=lambda chunk: print(chunk, end="", flush=True),
)
asyncio.run(stream())
Vector Store Layout
~/.atlas/index/
├── video_index/ # zvec collection — multimodal insights per segment
└── video_chat/ # zvec collection — chat history per video
All data (indexed segments, chat history, video metadata) is stored directly in the zvec collections — no sidecar files or external registries.
License
Apache License 2.0
Contributing
Contributions welcome — please open a PR.
Credits
Atlas was originally developed at VeedoAI and is now open-sourced for the community.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atlas_video-0.4.1.tar.gz.
File metadata
- Download URL: atlas_video-0.4.1.tar.gz
- Upload date:
- Size: 65.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
537b63089bf081b943e880b33af82f056453652d4c3a9270c86cf838d173170a
|
|
| MD5 |
f68b11d5ba922e36301b32163cf17a4a
|
|
| BLAKE2b-256 |
eda5ba4aab75626f26e22e25499a523cfc5ec8a9b75bdc68bc8a923033709ec7
|
Provenance
The following attestation bundles were made for atlas_video-0.4.1.tar.gz:
Publisher:
publish.yml on nwaughachukwuma/atlas-video
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_video-0.4.1.tar.gz -
Subject digest:
537b63089bf081b943e880b33af82f056453652d4c3a9270c86cf838d173170a - Sigstore transparency entry: 1115459757
- Sigstore integration time:
-
Permalink:
nwaughachukwuma/atlas-video@dfb419975a73c520f56bfb769258a62246e4fbfa -
Branch / Tag:
refs/tags/v0.4.1b - Owner: https://github.com/nwaughachukwuma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfb419975a73c520f56bfb769258a62246e4fbfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file atlas_video-0.4.1-py3-none-any.whl.
File metadata
- Download URL: atlas_video-0.4.1-py3-none-any.whl
- Upload date:
- Size: 82.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f1f5f98552567e857539f8523fb7942023ba20202cb4a3c2c972ab68b7c15aa
|
|
| MD5 |
8be2301d1df9aaa0b91527079ef7579e
|
|
| BLAKE2b-256 |
f763e9e6d0282234b45ecd729179c738f335a31c56d98fcab1fbf01311d9b55e
|
Provenance
The following attestation bundles were made for atlas_video-0.4.1-py3-none-any.whl:
Publisher:
publish.yml on nwaughachukwuma/atlas-video
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_video-0.4.1-py3-none-any.whl -
Subject digest:
8f1f5f98552567e857539f8523fb7942023ba20202cb4a3c2c972ab68b7c15aa - Sigstore transparency entry: 1115459772
- Sigstore integration time:
-
Permalink:
nwaughachukwuma/atlas-video@dfb419975a73c520f56bfb769258a62246e4fbfa -
Branch / Tag:
refs/tags/v0.4.1b - Owner: https://github.com/nwaughachukwuma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfb419975a73c520f56bfb769258a62246e4fbfa -
Trigger Event:
push
-
Statement type: