AI-powered video extraction API for metadata, transcripts, faces, scenes, objects, and more
Project description
Media Engine
AI-powered video metadata extraction API for small TV stations and content creators. Provides a "file in → JSON out" API that extracts metadata, transcripts, faces, scenes, objects, CLIP embeddings, OCR text, and camera motion from video files.
Installation
# Apple Silicon Mac
pip install media-engine[mlx]
pip install https://github.com/harperreed/mlx_clip/archive/refs/heads/main.zip
# NVIDIA GPU
pip install media-engine[cuda]
# CPU only
pip install media-engine[cpu]
# Speaker diarization (optional, run after platform install)
pip install pyannote-audio
pip install --upgrade torch torchaudio torchvision
# Start the server
meng-server
Requirements: Python 3.12+, ffmpeg
Note: Speaker diarization requires a HuggingFace token with access to the pyannote models. Set
hf_tokenin~/.config/polybos/config.json. pyannote-audio pins an older torch version, so the two-step install above prevents a torch downgrade.
Features
- Metadata extraction - Duration, resolution, codec, GPS, device info (modular per-manufacturer)
- Transcription - Whisper with speaker diarization
- Face detection - DeepFace with embedding clustering
- Scene detection - PySceneDetect content-aware boundaries
- Object detection - YOLO or Qwen VLM
- CLIP embeddings - Per-scene similarity search
- OCR - PaddleOCR text extraction
- Motion analysis - Camera pan/tilt/zoom detection
- Shot type - Aerial, interview, b-roll classification
Quick Start
Docker (Recommended)
# Clone and run
git clone https://github.com/thetrainroom/media-engine.git
cd media-engine
# Start the server
docker compose up -d
# Test
curl http://localhost:8001/health
Mount your media folder:
MEDIA_PATH=/path/to/videos docker compose up -d
For NVIDIA GPU support (uses Dockerfile.cuda):
docker compose --profile gpu up -d
Or build manually:
docker build -f Dockerfile.cuda -t media-engine-gpu .
docker run -p 8001:8001 --gpus all -v /path/to/media:/media media-engine-gpu
Apple Silicon (Recommended: Native)
Docker on macOS runs in a Linux VM without Metal/MPS access. For GPU acceleration on Apple Silicon, run natively:
pip install media-engine[mlx]
pip install https://github.com/harperreed/mlx_clip/archive/refs/heads/main.zip
meng-server
A Dockerfile.mlx is provided for consistency, but will use CPU in Docker:
docker compose --profile mlx up -d
Development Installation
# Mac Apple Silicon
make install-mlx
# NVIDIA GPU
make install-cuda
# CPU only
make install-cpu
# Speaker diarization (optional)
make install-diarization
# Run server with hot reload
uvicorn media_engine.main:app --reload --port 8001
Or without Make:
pip install -e ".[mlx]" # or cuda, cpu
pip install https://github.com/harperreed/mlx_clip/archive/refs/heads/main.zip # mlx only
pip install pyannote-audio # optional: speaker diarization
pip install --upgrade torch torchaudio torchvision # restore torch version
API Usage
Extract metadata from a video
curl -X POST http://localhost:8001/extract \
-H "Content-Type: application/json" \
-d '{
"file": "/media/video.mp4",
"enable_metadata": true,
"enable_transcript": true,
"enable_faces": false,
"enable_scenes": true,
"enable_objects": false,
"enable_clip": false,
"enable_ocr": false,
"enable_motion": false
}'
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/extract |
POST | Extract features from video |
/extractors |
GET | List available extractors |
Supported Devices
The metadata extractor automatically detects camera/device type:
| Manufacturer | Models | Features |
|---|---|---|
| DJI | Mavic, Air, Mini, Pocket, Osmo, Action | GPS from SRT, color profiles |
| Sony | PXW, FX, Alpha, ZV series | XML sidecar, S-Log, GPS |
| Canon | Cinema EOS, EOS R | XML sidecar |
| Apple | iPhone, iPad | QuickTime metadata, GPS |
| Blackmagic | Pocket, URSA, BRAW | ProApps metadata, BRAW detection |
| RED | DSMC2, V-RAPTOR, KOMODO | R3D native support |
| ARRI | ALEXA, ALEXA Mini, AMIRA | ARRIRAW detection |
| Insta360 | X3, X4, ONE RS, GO 3 | 360 video detection |
| FFmpeg | OBS, Handbrake, etc. | Encoder detection |
Adding new manufacturers is easy - create a module in media_engine/extractors/metadata/.
Architecture
media_engine/
├── main.py # FastAPI app
├── config.py # Settings and platform detection
├── schemas.py # Pydantic models
└── extractors/
├── metadata/ # Modular per-manufacturer
│ ├── dji.py
│ ├── sony.py
│ ├── apple.py
│ └── ...
├── transcribe.py # Whisper (MLX/CUDA/CPU)
├── faces.py # DeepFace + embeddings
├── scenes.py # PySceneDetect
├── objects.py # YOLO
├── objects_qwen.py # Qwen VLM
├── clip.py # CLIP embeddings
├── ocr.py # PaddleOCR
└── motion.py # Optical flow analysis
Configuration
Settings are stored in ~/.config/polybos/config.json:
{
"whisper_model": "large-v3",
"fallback_language": "en",
"hf_token": null,
"face_sample_fps": 1.0,
"object_sample_fps": 2.0,
"ocr_languages": ["en", "no", "de", "fr", "es"]
}
Set hf_token to enable speaker diarization (requires accepting license at pyannote).
Development
# Install dev dependencies
pip install -e ".[dev]"
# Lint
ruff check media_engine/
# Type check
pyright media_engine/
# Test
export TEST_VIDEO_PATH=/path/to/test.mp4
pytest tests/
Contributing
Contributions welcome! To add support for a new camera manufacturer:
- Create
media_engine/extractors/metadata/yourmanufacturer.py - Implement
detect()andextract()methods - Register with
register_extractor("name", YourExtractor()) - Import in
metadata/__init__.py
See existing modules for examples.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file media_engine-0.3.0.tar.gz.
File metadata
- Download URL: media_engine-0.3.0.tar.gz
- Upload date:
- Size: 171.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36c45f507b84e0ec77eb62a1aca9901bed7ce93a2fdcd6a7244f7bee67c21993
|
|
| MD5 |
627ca5ddc3a2986676794c1d59caa9d3
|
|
| BLAKE2b-256 |
bd14ba580b0314e54723e9bd615712771481d7f137c0a709e6a5d8596334571f
|
Provenance
The following attestation bundles were made for media_engine-0.3.0.tar.gz:
Publisher:
release.yml on thetrainroom/media-engine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
media_engine-0.3.0.tar.gz -
Subject digest:
36c45f507b84e0ec77eb62a1aca9901bed7ce93a2fdcd6a7244f7bee67c21993 - Sigstore transparency entry: 919806662
- Sigstore integration time:
-
Permalink:
thetrainroom/media-engine@99875340486b6b15a14becf088df27beb3663027 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/thetrainroom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@99875340486b6b15a14becf088df27beb3663027 -
Trigger Event:
push
-
Statement type:
File details
Details for the file media_engine-0.3.0-py3-none-any.whl.
File metadata
- Download URL: media_engine-0.3.0-py3-none-any.whl
- Upload date:
- Size: 163.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15d84546ee685a1e0194753a957698d1ca2d0fb01342c94f98704e7007548ab7
|
|
| MD5 |
135c154811f380427ed173a8e5a7e12a
|
|
| BLAKE2b-256 |
b1f559d2079617a4a14e20d55e601e442668e2d2a44d48a5c791716a4ac5833b
|
Provenance
The following attestation bundles were made for media_engine-0.3.0-py3-none-any.whl:
Publisher:
release.yml on thetrainroom/media-engine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
media_engine-0.3.0-py3-none-any.whl -
Subject digest:
15d84546ee685a1e0194753a957698d1ca2d0fb01342c94f98704e7007548ab7 - Sigstore transparency entry: 919806679
- Sigstore integration time:
-
Permalink:
thetrainroom/media-engine@99875340486b6b15a14becf088df27beb3663027 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/thetrainroom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@99875340486b6b15a14becf088df27beb3663027 -
Trigger Event:
push
-
Statement type: