AI-powered video extraction API for metadata, transcripts, faces, scenes, objects, and more

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Media Engine

AI-powered video metadata extraction API for small TV stations and content creators. Provides a "file in → JSON out" API that extracts metadata, transcripts, faces, scenes, objects, CLIP embeddings, OCR text, and camera motion from video files.

Installation

# Apple Silicon Mac
pip install media-engine[mlx]
pip install https://github.com/harperreed/mlx_clip/archive/refs/heads/main.zip

# NVIDIA GPU
pip install media-engine[cuda]

# CPU only
pip install media-engine[cpu]

# Speaker diarization (optional, run after platform install)
pip install pyannote-audio
pip install --upgrade torch torchaudio torchvision

# Start the server
meng-server

Requirements: Python 3.12+, ffmpeg

Note: Speaker diarization requires a HuggingFace token with access to the pyannote models. Set hf_token in ~/.config/polybos/config.json. pyannote-audio pins an older torch version, so the two-step install above prevents a torch downgrade.

Features

Metadata extraction - Duration, resolution, codec, GPS, device info (modular per-manufacturer)
Transcription - Whisper with speaker diarization
Face detection - DeepFace with embedding clustering
Scene detection - PySceneDetect content-aware boundaries
Object detection - YOLO or Qwen VLM
CLIP embeddings - Per-scene similarity search
OCR - PaddleOCR text extraction
Motion analysis - Camera pan/tilt/zoom detection
Shot type - Aerial, interview, b-roll classification

Quick Start

Docker (Recommended)

# Clone and run
git clone https://github.com/thetrainroom/media-engine.git
cd media-engine

# Start the server
docker compose up -d

# Test
curl http://localhost:8001/health

Mount your media folder:

MEDIA_PATH=/path/to/videos docker compose up -d

For NVIDIA GPU support (uses Dockerfile.cuda):

docker compose --profile gpu up -d

Or build manually:

docker build -f Dockerfile.cuda -t media-engine-gpu .
docker run -p 8001:8001 --gpus all -v /path/to/media:/media media-engine-gpu

Apple Silicon (Recommended: Native)

Docker on macOS runs in a Linux VM without Metal/MPS access. For GPU acceleration on Apple Silicon, run natively:

pip install media-engine[mlx]
pip install https://github.com/harperreed/mlx_clip/archive/refs/heads/main.zip
meng-server

A Dockerfile.mlx is provided for consistency, but will use CPU in Docker:

docker compose --profile mlx up -d

Development Installation

# Mac Apple Silicon
make install-mlx

# NVIDIA GPU
make install-cuda

# CPU only
make install-cpu

# Speaker diarization (optional)
make install-diarization

# Run server with hot reload
uvicorn media_engine.main:app --reload --port 8001

Or without Make:

pip install -e ".[mlx]"          # or cuda, cpu
pip install https://github.com/harperreed/mlx_clip/archive/refs/heads/main.zip  # mlx only
pip install pyannote-audio       # optional: speaker diarization
pip install --upgrade torch torchaudio torchvision  # restore torch version

API Usage

Extract metadata from a video

curl -X POST http://localhost:8001/extract \
  -H "Content-Type: application/json" \
  -d '{
    "file": "/media/video.mp4",
    "enable_metadata": true,
    "enable_transcript": true,
    "enable_faces": false,
    "enable_scenes": true,
    "enable_objects": false,
    "enable_clip": false,
    "enable_ocr": false,
    "enable_motion": false
  }'

Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/extract`	POST	Extract features from video
`/extractors`	GET	List available extractors

Supported Devices

The metadata extractor automatically detects camera/device type:

Manufacturer	Models	Features
DJI	Mavic, Air, Mini, Pocket, Osmo, Action	GPS from SRT, color profiles
Sony	PXW, FX, Alpha, ZV series	XML sidecar, S-Log, GPS
Canon	Cinema EOS, EOS R	XML sidecar
Apple	iPhone, iPad	QuickTime metadata, GPS
Blackmagic	Pocket, URSA, BRAW	ProApps metadata, BRAW detection
RED	DSMC2, V-RAPTOR, KOMODO	R3D native support
ARRI	ALEXA, ALEXA Mini, AMIRA	ARRIRAW detection
Insta360	X3, X4, ONE RS, GO 3	360 video detection
FFmpeg	OBS, Handbrake, etc.	Encoder detection

Adding new manufacturers is easy - create a module in media_engine/extractors/metadata/.

Architecture

media_engine/
├── main.py              # FastAPI app
├── config.py            # Settings and platform detection
├── schemas.py           # Pydantic models
└── extractors/
    ├── metadata/        # Modular per-manufacturer
    │   ├── dji.py
    │   ├── sony.py
    │   ├── apple.py
    │   └── ...
    ├── transcribe.py    # Whisper (MLX/CUDA/CPU)
    ├── faces.py         # DeepFace + embeddings
    ├── scenes.py        # PySceneDetect
    ├── objects.py       # YOLO
    ├── objects_qwen.py  # Qwen VLM
    ├── clip.py          # CLIP embeddings
    ├── ocr.py           # PaddleOCR
    └── motion.py        # Optical flow analysis

Configuration

Settings are stored in ~/.config/polybos/config.json:

{
  "whisper_model": "large-v3",
  "fallback_language": "en",
  "hf_token": null,
  "face_sample_fps": 1.0,
  "object_sample_fps": 2.0,
  "ocr_languages": ["en", "no", "de", "fr", "es"]
}

Set hf_token to enable speaker diarization (requires accepting license at pyannote).

Development

# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check media_engine/

# Type check
pyright media_engine/

# Test
export TEST_VIDEO_PATH=/path/to/test.mp4
pytest tests/

Contributing

Contributions welcome! To add support for a new camera manufacturer:

Create media_engine/extractors/metadata/yourmanufacturer.py
Implement detect() and extract() methods
Register with register_extractor("name", YourExtractor())
Import in metadata/__init__.py

See existing modules for examples.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tgschwind

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.4

May 3, 2026

0.3.3

Apr 22, 2026

0.3.2

Mar 25, 2026

0.3.1

Mar 23, 2026

This version

0.3.0

Feb 5, 2026

0.2.1

Feb 3, 2026

0.2.0

Feb 3, 2026

0.1.1

Feb 1, 2026

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

media_engine-0.3.0.tar.gz (171.2 kB view details)

Uploaded Feb 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

media_engine-0.3.0-py3-none-any.whl (163.9 kB view details)

Uploaded Feb 5, 2026 Python 3

File details

Details for the file media_engine-0.3.0.tar.gz.

File metadata

Download URL: media_engine-0.3.0.tar.gz
Upload date: Feb 5, 2026
Size: 171.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for media_engine-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`36c45f507b84e0ec77eb62a1aca9901bed7ce93a2fdcd6a7244f7bee67c21993`
MD5	`627ca5ddc3a2986676794c1d59caa9d3`
BLAKE2b-256	`bd14ba580b0314e54723e9bd615712771481d7f137c0a709e6a5d8596334571f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for media_engine-0.3.0.tar.gz:

Publisher: release.yml on thetrainroom/media-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: media_engine-0.3.0.tar.gz
- Subject digest: 36c45f507b84e0ec77eb62a1aca9901bed7ce93a2fdcd6a7244f7bee67c21993
- Sigstore transparency entry: 919806662
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: thetrainroom/media-engine@99875340486b6b15a14becf088df27beb3663027
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/thetrainroom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@99875340486b6b15a14becf088df27beb3663027
- Trigger Event: push

File details

Details for the file media_engine-0.3.0-py3-none-any.whl.

File metadata

Download URL: media_engine-0.3.0-py3-none-any.whl
Upload date: Feb 5, 2026
Size: 163.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for media_engine-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`15d84546ee685a1e0194753a957698d1ca2d0fb01342c94f98704e7007548ab7`
MD5	`135c154811f380427ed173a8e5a7e12a`
BLAKE2b-256	`b1f559d2079617a4a14e20d55e601e442668e2d2a44d48a5c791716a4ac5833b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for media_engine-0.3.0-py3-none-any.whl:

Publisher: release.yml on thetrainroom/media-engine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: media_engine-0.3.0-py3-none-any.whl
- Subject digest: 15d84546ee685a1e0194753a957698d1ca2d0fb01342c94f98704e7007548ab7
- Sigstore transparency entry: 919806679
- Sigstore integration time: Feb 5, 2026
Source repository:
- Permalink: thetrainroom/media-engine@99875340486b6b15a14becf088df27beb3663027
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/thetrainroom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@99875340486b6b15a14becf088df27beb3663027
- Trigger Event: push

media-engine 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Media Engine

Installation

Features

Quick Start

Docker (Recommended)

Apple Silicon (Recommended: Native)

Development Installation

API Usage

Extract metadata from a video

Endpoints

Supported Devices

Architecture

Configuration

Development

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance