Skip to main content

AI-powered video extraction API for metadata, transcripts, faces, scenes, objects, and more

Project description

Media Engine

AI-powered video metadata extraction API for small TV stations and content creators. Provides a "file in → JSON out" API that extracts metadata, transcripts, faces, scenes, objects, CLIP embeddings, OCR text, and camera motion from video files.

Installation

# Apple Silicon Mac
pip install media-engine[mlx]

# NVIDIA GPU
pip install media-engine[cuda]

# CPU only
pip install media-engine[cpu]

# Start the server
meng-server

Requirements: Python 3.12+, ffmpeg

Features

  • Metadata extraction - Duration, resolution, codec, GPS, device info (modular per-manufacturer)
  • Transcription - Whisper with speaker diarization
  • Face detection - DeepFace with embedding clustering
  • Scene detection - PySceneDetect content-aware boundaries
  • Object detection - YOLO or Qwen VLM
  • CLIP embeddings - Per-scene similarity search
  • OCR - PaddleOCR text extraction
  • Motion analysis - Camera pan/tilt/zoom detection
  • Shot type - Aerial, interview, b-roll classification

Quick Start

Docker (Recommended)

# Clone and run
git clone https://github.com/thetrainroom/media-engine.git
cd media-engine

# Start the server
docker compose up -d

# Test
curl http://localhost:8001/health

Mount your media folder:

MEDIA_PATH=/path/to/videos docker compose up -d

For NVIDIA GPU support (uses Dockerfile.cuda):

docker compose --profile gpu up -d

Or build manually:

docker build -f Dockerfile.cuda -t media-engine-gpu .
docker run -p 8001:8001 --gpus all -v /path/to/media:/media media-engine-gpu

Apple Silicon (Recommended: Native)

Docker on macOS runs in a Linux VM without Metal/MPS access. For GPU acceleration on Apple Silicon, run natively:

pip install media-engine[mlx]
meng-server

A Dockerfile.mlx is provided for consistency, but will use CPU in Docker:

docker compose --profile mlx up -d

Development Installation

# Mac Apple Silicon
pip install -e ".[mlx]"

# NVIDIA GPU
pip install -e ".[cuda]"

# CPU only
pip install -e ".[cpu]"

# Run server with hot reload
uvicorn media_engine.main:app --reload --port 8001

API Usage

Extract metadata from a video

curl -X POST http://localhost:8001/extract \
  -H "Content-Type: application/json" \
  -d '{
    "file": "/media/video.mp4",
    "enable_metadata": true,
    "enable_transcript": true,
    "enable_faces": false,
    "enable_scenes": true,
    "enable_objects": false,
    "enable_clip": false,
    "enable_ocr": false,
    "enable_motion": false
  }'

Endpoints

Endpoint Method Description
/health GET Health check
/extract POST Extract features from video
/extractors GET List available extractors

Supported Devices

The metadata extractor automatically detects camera/device type:

Manufacturer Models Features
DJI Mavic, Air, Mini, Pocket, Osmo, Action GPS from SRT, color profiles
Sony PXW, FX, Alpha, ZV series XML sidecar, S-Log, GPS
Canon Cinema EOS, EOS R XML sidecar
Apple iPhone, iPad QuickTime metadata, GPS
Blackmagic Pocket, URSA, BRAW ProApps metadata, BRAW detection
RED DSMC2, V-RAPTOR, KOMODO R3D native support
ARRI ALEXA, ALEXA Mini, AMIRA ARRIRAW detection
Insta360 X3, X4, ONE RS, GO 3 360 video detection
FFmpeg OBS, Handbrake, etc. Encoder detection

Adding new manufacturers is easy - create a module in media_engine/extractors/metadata/.

Architecture

media_engine/
├── main.py              # FastAPI app
├── config.py            # Settings and platform detection
├── schemas.py           # Pydantic models
└── extractors/
    ├── metadata/        # Modular per-manufacturer
    │   ├── dji.py
    │   ├── sony.py
    │   ├── apple.py
    │   └── ...
    ├── transcribe.py    # Whisper (MLX/CUDA/CPU)
    ├── faces.py         # DeepFace + embeddings
    ├── scenes.py        # PySceneDetect
    ├── objects.py       # YOLO
    ├── objects_qwen.py  # Qwen VLM
    ├── clip.py          # CLIP embeddings
    ├── ocr.py           # PaddleOCR
    └── motion.py        # Optical flow analysis

Configuration

Settings are stored in ~/.config/polybos/config.json:

{
  "whisper_model": "large-v3",
  "fallback_language": "en",
  "hf_token": null,
  "face_sample_fps": 1.0,
  "object_sample_fps": 2.0,
  "ocr_languages": ["en", "no", "de", "fr", "es"]
}

Set hf_token to enable speaker diarization (requires accepting license at pyannote).

Development

# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check media_engine/

# Type check
pyright media_engine/

# Test
export TEST_VIDEO_PATH=/path/to/test.mp4
pytest tests/

Contributing

Contributions welcome! To add support for a new camera manufacturer:

  1. Create media_engine/extractors/metadata/yourmanufacturer.py
  2. Implement detect() and extract() methods
  3. Register with register_extractor("name", YourExtractor())
  4. Import in metadata/__init__.py

See existing modules for examples.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

media_engine-0.2.1.tar.gz (168.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

media_engine-0.2.1-py3-none-any.whl (160.8 kB view details)

Uploaded Python 3

File details

Details for the file media_engine-0.2.1.tar.gz.

File metadata

  • Download URL: media_engine-0.2.1.tar.gz
  • Upload date:
  • Size: 168.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for media_engine-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9c02be203352057995e05950a3265b55144ce0a52dc1cda9e805b3d6973f03e8
MD5 41123d9cd99070ae81cc96ad4aa347ce
BLAKE2b-256 c937f31ceaf3b7b89d99ff5fc95d95b13e5c1dc904b134b15448f7f1b810b7c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for media_engine-0.2.1.tar.gz:

Publisher: release.yml on thetrainroom/media-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file media_engine-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: media_engine-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 160.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for media_engine-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e94b17da57f2e74d5e1a294a60342d24dfcd05aea8c455ef1cb4de00ca7cdb1e
MD5 e574288d6c064c74c38e31f97ac722eb
BLAKE2b-256 7193bb5c31b131b000f326fe45ee37e9dd6a67faf08fd4eff5c3c0d46e0a02d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for media_engine-0.2.1-py3-none-any.whl:

Publisher: release.yml on thetrainroom/media-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page