An ML package for GStreamer
Project description
GStreamer Python ML
This project provides a pure Python ML framework for upstream GStreamer, supporting a broad range of ML vision and language features.
Supported functionality includes:
- object detection
- tracking
- pose estimation (COCO 17-keypoint skeleton)
- monocular depth estimation
- zero-shot classification (CLIP / SigLIP)
- video captioning
- translation
- transcription
- voice activity detection
- speech to text
- text to speech
- text to image
- LLMs
- serializing model metadata to Kafka server
Different ML toolkits are supported via the MLEngine abstraction: PyTorch, ONNX Runtime, OpenVINO,
LiteRT (TFLite), TensorFlow, Apache TVM, tinygrad, Apple MLX, Meta ExecuTorch, llama.cpp, HuggingFace Candle, and JAX/Flax.
All testing thus far has been done primarily with PyTorch.
These elements will work with your distribution's GStreamer packages as long as the GStreamer version is >= 1.24.
Table of Contents
- Install
- Post Install
- Custom Plugins
- Pipelines
- Classification
- Object Detection
- Pose Estimation
- Depth Estimation
- Zero-Shot Classification (CLIP / SigLIP)
- Voice Activity Detection
- Transcription
- LLM
- Stable Diffusion
- Kafka Sink
- Segment Anything (SAM)
- OCR
- Face Detection & Recognition
- Optical Flow
- Super-Resolution
- Action Recognition
- Anomaly Detection
- Audio Classification (CLAP)
- Vision-Language Model (VLM)
- Embedding Extractor
- Multi-Object Tracker
- ML Alert
Install
There are two installation options described below: on host machine or on Docker container:
Host Install
Install distribution packages
Ubuntu
sudo apt update && sudo apt -y upgrade
sudo apt install -y python3-pip python3-venv \
gstreamer1.0-plugins-base gstreamer1.0-plugins-base-apps \
gstreamer1.0-plugins-good gstreamer1.0-plugins-bad \
gir1.2-gst-plugins-bad-1.0 python3-gst-1.0 gstreamer1.0-python3-plugin-loader \
libcairo2 libcairo2-dev git
Fedora
(adjust Fedora version from 42 to match your version number)
sudo dnf install https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-42.noarch.rpm https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-42.noarch.rpm
sudo dnf update -y
sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda -y
sudo dnf upgrade -y
sudo dnf install -y python3-pip \
python3-devel cairo cairo-devel cairo-gobject-devel pkgconfig git \
gstreamer1-plugins-base gstreamer1-plugins-base-tools \
gstreamer1-plugins-good gstreamer1-plugins-bad-free \
gstreamer1-plugins-bad-free-devel python3-gstreamer1
Windows
-
Install GStreamer from the official site. Download and install both the runtime and development MSVC x86_64 installers. The default install path is
C:\gstreamer\1.0\msvc_x86_64. -
Set environment variables (adjust paths if your install location differs):
# Add GStreamer to PATH
[Environment]::SetEnvironmentVariable("PATH", "C:\gstreamer\1.0\msvc_x86_64\bin;" + $env:PATH, "User")
# Point GStreamer at your plugin directory
[Environment]::SetEnvironmentVariable("GST_PLUGIN_PATH", "D:\Workspace\gst-python-ml\plugins", "User")
-
Install Python 3.12+ from python.org or via conda.
-
Install PyGObject — on Windows the easiest route is via conda or the gstreamer-python wheel:
pip install gstreamer-python
- CUDA (optional) — install the CUDA Toolkit matching your GPU driver version, then install the CUDA-enabled PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Known issue: The gst-python plugin loader on Windows may discover the plugin directory but register 0 features, preventing
gst-launch-1.0from findingpyml_*elements. This is a known Windows-specific issue in gst-python — see #18 for details and workarounds. As a workaround, you can register plugins explicitly from a Python script usingGst.Element.register().
Manage Python packages
Important: Python version must match GStreamer
GStreamer's Python plugin loader (libgstpython.so) embeds the system Python interpreter.
The virtual environment must be created with the same Python version that GStreamer uses,
otherwise import errors will occur at runtime (e.g. No module named 'torch').
On Fedora 42+ this is Python 3.14. On Ubuntu 26.04 this is Python 3.14. On Ubuntu 24.04 this is Python 3.12.
set up venv with system Python
python3 -m venv --system-site-packages .venv
source .venv/bin/activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e .
Alternative: manage with uv
If using uv, ensure uv uses the system Python (not a downloaded one):
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python /usr/bin/python3 --system-site-packages
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
uv sync
ONNX Runtime
For CPU inference:
uv sync --extra onnx
For GPU inference (requires CUDA):
uv sync --extra onnx-gpu
tinygrad
pip install tinygrad
or
uv sync --extra tinygrad
Apple MLX (macOS Apple Silicon only)
pip install mlx mlx-lm
or
uv sync --extra mlx
ExecuTorch
Requires Python 3.10–3.13 (no 3.14 wheel yet).
pip install executorch
or
uv sync --extra executorch
llama.cpp
pip install llama-cpp-python
or
uv sync --extra llamacpp
For GPU support, set the build flag:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
Candle
Candle (HuggingFace Rust inference) requires building from source with maturin:
pip install maturin
git clone https://github.com/huggingface/candle.git
cd candle/candle-pyo3
maturin develop -r
Apache TVM
TVM is a deep learning compiler for model optimization and deployment. The PyPI
apache-tvm package is stale — install from source:
sudo apt install zlib1g-dev libxml2-dev # Ubuntu/Debian
git clone --recursive https://github.com/apache/tvm.git
cd tvm
mkdir build && cd build
cp ../cmake/config.cmake .
echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake
echo "set(USE_LLVM \"llvm-config --ignore-libllvm --link-static\")" >> config.cmake
echo "set(USE_CUDA ON)" >> config.cmake # set OFF if no GPU
cmake .. && cmake --build . --parallel $(nproc)
cd ../3rdparty/tvm-ffi && pip install . && cd ../..
pip install -e .
Requires: CMake >= 3.24, LLVM >= 15, Python >= 3.10. See TVM install docs for full details.
JAX
For CPU:
pip install jax[cpu]
or
uv sync --extra jax-cpu
For GPU (CUDA 12):
pip install jax[cuda12]
or
uv sync --extra jax-gpu
Now manually install flash-attn wheel (must match your version of python, torch and cuda) For example, for torch 2.11 + CUDA 12.8 + Python 3.14:
pip install ./flash_attn-2.8.3+cu128torch2.11-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Pre-built wheels can be found here: https://github.com/mjun0812/flash-attention-prebuild-wheels/releases
Clone repo
cd $HOME/src
git clone https://github.com/collabora/gst-python-ml.git
Update .bashrc
echo 'export GST_PLUGIN_PATH=$HOME/src/gst-python-ml/plugins:$GST_PLUGIN_PATH' >> ~/.bashrc
source ~/.bashrc
Docker Install
Build Docker Container
Important Note:
This Dockerfile maps a local gst-python-ml repository to the container,
and expects this repository to be located in $HOME/src i.e. $HOME/src/gst-python-ml.
Enable Docker GPU Support on Host
To use the host GPU in a docker container, you will need to install the nvidia container toolkit. If running on CPU, these steps can be skipped.
Ubuntu
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker
Fedora
sudo dnf install docker
sudo usermod -aG docker $USER
# Then either log out/in completely, or:
newgrp docker
# 1. Add NVIDIA Container Toolkit repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# 2. Remove Fedora's conflicting partial package (if present)
sudo dnf remove -y golang-github-nvidia-container-toolkit 2>/dev/null || true
# 3. Install the full NVIDIA Container Toolkit
sudo dnf install -y nvidia-container-toolkit
# 4. Configure Docker to use the NVIDIA runtime as default
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json > /dev/null <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
EOF
# 5. Fix Fedora's broken dockerd ExecStart (required!)
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf >/dev/null <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
EOF
# 6. Reload and restart Docker
sudo systemctl daemon-reload
sudo systemctl restart docker
# 7. Verify it works
docker info --format '{{.DefaultRuntime}}' # → should print: nvidia
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
Build Container
docker build -f ./Dockerfile_ubuntu24 -t ubuntu24:latest .
docker build -f ./Dockerfile_ubuntu26 -t ubuntu26:latest .
docker build -f ./Dockerfile_fedora42 -t fedora42:latest .
Run Docker Container
Note: If running on CPU, just remove --gpus all from commands below:
docker run -v ~/src/gst-python-ml/:/root/gst-python-ml -it --rm --gpus all --name ubuntu24 ubuntu24:latest /bin/bash
or
docker run -v ~/src/gst-python-ml/:/root/gst-python-ml -it --rm --gpus all --name ubuntu26 ubuntu26:latest /bin/bash
or
docker run -v ~/src/gst-python-ml/:/root/gst-python-ml -it --rm --gpus all --name fedora42 fedora42:latest /bin/bash
Now, in the container shell, set up the venv as detailed above.
Post Install
Run gst-inspect-1.0 python to list pyml elements.
Custom Plugins
You can create your own GStreamer elements that inherit from the gst-python-ml base classes
(BaseObjectDetector, BaseTransform, BaseClassifier, etc.) in a separate directory.
Directory Structure
my_plugins/
python/
my_detector.py
my_classifier.py
Example: Custom Object Detector
CAN_REGISTER_ELEMENT = True
try:
import gi
gi.require_version("Gst", "1.0")
gi.require_version("GstBase", "1.0")
gi.require_version("GObject", "2.0")
from gi.repository import GObject, Gst, GstBase
from base_objectdetector import BaseObjectDetector
except ImportError as e:
CAN_REGISTER_ELEMENT = False
print(f"my_detector not available: {e}")
if CAN_REGISTER_ELEMENT:
class MyDetector(BaseObjectDetector):
__gstmetadata__ = (
"My Custom Detector",
"Video/Filter",
"A custom object detector",
"Your Name",
)
GObject.type_register(MyDetector)
__gstelementfactory__ = ("my_detector", Gst.Rank.NONE, MyDetector)
Note: When a pipeline begins, GStreamer scans all scripts for GStreamer elements, including elements that are not actually in the pipeline. To ensure that startup is fast, please avoid placing heavy imports such as NumPy at the module level, as these will be imported by GStreamer. Instead, favour importing at the method level - since Python caches imports, this will have no performance impact.
Environment Setup
Set both GST_PLUGIN_PATH (so GStreamer discovers your .py files) and PYTHONPATH
(so Python can import your modules):
export GST_PLUGIN_PATH=$HOME/src/gst-python-ml/plugins:$HOME/my_plugins:$GST_PLUGIN_PATH
export PYTHONPATH=$HOME/my_plugins/python:$PYTHONPATH
The gst-python loader adds the first python/ directory it finds to sys.path.
By listing the framework directory first, all gst-python-ml base classes (base_objectdetector,
base_transform, base_classifier, base_caption, base_llm, etc.) are importable
by custom plugins. The PYTHONPATH entry ensures gst-python can also resolve your
custom modules from the second directory.
Available Base Classes
| Base Class | Module | Description |
|---|---|---|
BaseTransform |
base_transform |
Base for all video transform elements |
BaseObjectDetector |
base_objectdetector |
Object detection with bounding boxes |
BaseClassifier |
base_classifier |
Image classification |
BaseCaption |
base_caption |
Video/image captioning |
BaseLLM |
base_llm |
Large language models |
BaseTranscribe |
base_transcribe |
Speech-to-text transcription |
BaseTranslate |
base_translate |
Text translation |
BaseTTS |
base_tts |
Text-to-speech synthesis |
BaseSeparate |
base_separate |
Audio source separation |
Verify
gst-inspect-1.0 my_detector
Using GStreamer Python ML Elements
Pipelines
Below are some sample pipelines for the various elements in this project.
Classification
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin ! videoconvert ! videoscale ! video/x-raw,width=640,height=480 ! pyml_classifier model-name=resnet18 device=cuda ! videoconvert ! autovideosink
Object Detection
TorchVision
pyml_objectdetector supports all TorchVision object detection models.
Simply choose a suitable model name and set it on the model-name property.
A few possible model names:
fasterrcnn_resnet50_fpn
ssdlite320_mobilenet_v3_large
fasterrcnn
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin ! videoconvert ! videoscale ! video/x-raw,width=640,height=480 ! pyml_objectdetector model-name=fasterrcnn_resnet50_fpn device=cuda batch-size=4 ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink
fasterrcnn/kafka
a) run pipeline from host
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin ! videoconvert ! videoscale ! video/x-raw,width=640,height=480 ! pyml_objectdetector model-name=fasterrcnn_resnet50_fpn device=cuda batch-size=4 ! pyml_kafkasink schema-file=data/pyml_object_detector.json broker=localhost:29092 topic=test-kafkasink-topic
b) run pipeline from docker
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin ! videoconvert ! videoscale ! video/x-raw,width=640,height=480 ! pyml_objectdetector model-name=fasterrcnn_resnet50_fpn device=cuda batch-size=4 ! pyml_kafkasink schema-file=data/pyml_object_detector.json broker=kafka:9092 topic=test-kafkasink-topic
maskrcnn
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin ! videoconvert ! videoscale ! pyml_maskrcnn device=cuda batch-size=4 model-name=maskrcnn_resnet50_fpn ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink
yolo with tracking
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_tracking.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480 ! pyml_yolo model-name=yolo11m device=cuda:0 track=True ! pyml_overlay ! videoconvert ! autovideosink
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_tracking.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480,format=RGB ! pyml_streammux name=mux filesrc location=data/soccer_tracking.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480,format=RGB ! mux. mux. ! pyml_yolo model-name=yolo11m device=cuda:0 track=True ! pyml_streamdemux name=demux demux. ! queue ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false demux. ! queue ! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/soccer_tracking.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480 ! demo_soccer model-name=yolo11m device=cuda:0 ! pyml_overlay ! videoconvert ! autovideosink
ONNX Engine
pyml_objectdetector supports any ONNX model via the engine-name=onnx property.
YOLO11 ONNX output ([B, 4+nc, anchors]) is automatically decoded with NMS — no manual post-processing required.
Export a YOLO11 model to ONNX with ultralytics:
yolo export model=yolo11m.pt format=onnx
YOLO11m ONNX object detection with overlay
Use input-format=nchw because YOLO expects channels-first input, and
post-process=anchor_free to decode the raw [B, 4+nc, anchors] output into
bounding boxes before handing off to pyml_overlay.
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=640,height=640" \
! pyml_objectdetector engine-name=onnx model-name=yolo11m.onnx device=cpu \
input-format=nchw post-process=anchor_free \
! videoconvert ! "video/x-raw,format=RGBA" \
! pyml_overlay ! videoconvert ! autovideosink
Generic ONNX passthrough (logs raw inference output)
Use pyml_inference to test any ONNX model and inspect raw output:
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=640,height=640" \
! pyml_inference engine-name=onnx model-name=yolo11m.onnx device=cpu \
! fakesink
pyml_inference also accepts engine-name=pytorch, engine-name=openvino, etc.
OpenVINO Engine
Export a YOLO11 model to OpenVINO IR format with ultralytics:
yolo export model=yolo11m.pt format=openvino
This produces yolo11m_openvino_model/yolo11m.xml and yolo11m.bin.
YOLO11m OpenVINO object detection with overlay
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=640,height=640" \
! pyml_objectdetector engine-name=openvino \
model-name=yolo11m_openvino_model/yolo11m.xml device=cpu \
input-format=nchw post-process=anchor_free \
! videoconvert ! "video/x-raw,format=RGBA" \
! pyml_overlay ! videoconvert ! autovideosink
Use device=GPU for Intel GPU acceleration (OpenVINO uses uppercase device names).
LiteRT (TFLite) Engine
Export a YOLO11 model to TFLite with ultralytics:
yolo export model=yolo11m.pt format=tflite
This produces yolo11m_saved_model/yolo11m_float32.tflite.
YOLO11m TFLite object detection with overlay
TFLite models expect NHWC input (default), so input-format does not need to be set.
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=640,height=640" \
! pyml_objectdetector engine-name=tflite \
model-name=yolo11m_saved_model/yolo11m_float32.tflite device=cpu \
post-process=anchor_free \
! videoconvert ! "video/x-raw,format=RGBA" \
! pyml_overlay ! videoconvert ! autovideosink
TensorFlow Engine
Export a YOLO11 model to TensorFlow SavedModel with ultralytics:
yolo export model=yolo11m.pt format=saved_model
YOLO11m TensorFlow object detection with overlay
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=640,height=640" \
! pyml_objectdetector engine-name=tensorflow \
model-name=yolo11m_saved_model device=cuda \
post-process=anchor_free \
! videoconvert ! "video/x-raw,format=RGBA" \
! pyml_overlay ! videoconvert ! autovideosink
tinygrad Engine
tinygrad supports TorchVision models, SafeTensors files, and Transformers models.
Set engine-name=tinygrad for lightweight GPU/CPU inference with automatic kernel optimization.
ResNet18 classification with tinygrad on GPU
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_classifier model-name=resnet18 device=cuda engine-name=tinygrad \
! fakesink
tinygrad on CPU
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_classifier model-name=resnet18 device=cpu engine-name=tinygrad \
! fakesink
TVM Engine
Apache TVM compiles models for optimized inference. Supports compiled .so/.tar
models and TorchVision models (auto-compiled via Relay). Set engine-name=tvm.
TorchVision model compiled with TVM
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_classifier model-name=resnet18 device=cuda engine-name=tvm \
! fakesink
Pre-compiled TVM model (.so)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=640,height=640" \
! pyml_inference engine-name=tvm model-name=compiled_model.so device=cuda \
! fakesink
Apple MLX Engine
MLX is designed for Apple Silicon (M1/M2/M3/M4). Supports SafeTensors, .npz weights,
and mlx-lm text generation. Set engine-name=mlx.
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_classifier model-name=resnet18 device=gpu engine-name=mlx \
! fakesink
ExecuTorch Engine
Meta ExecuTorch runs .pte models for on-device inference. Export a model with
torch.export + ExecuTorch, then set engine-name=executorch.
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_inference engine-name=executorch model-name=model.pte device=cpu \
! fakesink
llama.cpp Engine
GGUF quantized LLM inference via llama-cpp-python. Set engine-name=llamacpp
and point to a .gguf model file.
gst-launch-1.0 filesrc location=data/prompt_for_llm.txt \
! pyml_llm engine-name=llamacpp model-name=model.gguf device=cpu \
! fakesink
Candle Engine
HuggingFace Candle (Rust) inference via Python bindings. Supports SafeTensors models.
Set engine-name=candle.
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_inference engine-name=candle model-name=model.safetensors device=cpu \
! fakesink
JAX/Flax Engine
Google JAX with XLA compilation. Supports Flax checkpoints and HuggingFace models.
Set engine-name=jax for JIT-compiled inference on GPU, TPU, or CPU.
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale \
! "video/x-raw,format=RGB,width=224,height=224" \
! pyml_classifier model-name=resnet18 device=cpu engine-name=jax \
! fakesink
Pose Estimation
pyml_yolo_pose supports all YOLO pose models. Recommended model names:
yolo11n-pose (fastest)
yolo11s-pose
yolo11m-pose (best accuracy)
YOLO pose with skeleton visualization (rendered on frame)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_yolo_pose model-name=yolo11n-pose device=cuda \
! videoconvert ! autovideosink sync=false
YOLO pose with bounding box overlay (metadata only, no in-element rendering)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_yolo_pose model-name=yolo11n-pose device=cuda visualize=false \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
Depth Estimation
pyml_depth supports DepthAnything V2 models from HuggingFace. Available model sizes:
depth-anything/Depth-Anything-V2-Small-hf (fastest, ~100 MB)
depth-anything/Depth-Anything-V2-Base-hf
depth-anything/Depth-Anything-V2-Large-hf (most accurate)
Available colormaps: inferno (default), jet, viridis, plasma, magma
DepthAnything V2 with inferno colormap
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda \
! videoconvert ! autovideosink sync=false
DepthAnything V2 with jet colormap
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda colormap=jet \
! videoconvert ! autovideosink sync=false
Depth with reduced compute via frame-stride
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda frame-stride=2 \
! videoconvert ! autovideosink sync=false
Depth with original video side-by-side (tee)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! tee name=t \
t. ! queue ! pyml_depth model-name=depth-anything/Depth-Anything-V2-Small-hf device=cuda ! videoconvert ! autovideosink sync=false \
t. ! queue ! videoconvert ! autovideosink sync=false
Zero-Shot Classification (CLIP / SigLIP)
pyml_clip classifies each frame against a user-defined set of text labels
with no fixed label set — labels are set at pipeline launch time.
Supported models:
openai/clip-vit-base-patch32 (default, ~600 MB)
openai/clip-vit-large-patch14 (more accurate, ~1.7 GB)
google/siglip-base-patch16-224 (SigLIP, better zero-shot accuracy)
google/siglip-large-patch16-384 (SigLIP large)
CLIP with custom labels
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_clip model-name=openai/clip-vit-base-patch32 device=cuda \
labels="person, bicycle, car, dog, cat" top-k=3 \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
SigLIP (better zero-shot accuracy than CLIP)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_clip model-name=google/siglip-base-patch16-224 device=cuda \
labels="people walking, empty street, crowd, indoor scene" top-k=1 \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
CLIP with threshold (only report labels above 20% confidence)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue \
! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_clip model-name=openai/clip-vit-base-patch32 device=cuda \
labels="person, bicycle, car, dog, cat" threshold=0.2 \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
Voice Activity Detection
Standalone VAD with metadata (pass-through, speech probability attached to buffers)
GST_DEBUG=4 gst-launch-1.0 pulsesrc ! audio/x-raw,format=S16LE,rate=16000,channels=1 ! pyml_vad threshold=0.7 ! fakesink
VAD gating before transcription (mute silent audio, reduce Whisper latency)
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! audioresample ! audio/x-raw,format=S16LE,rate=16000,channels=1 ! pyml_vad threshold=0.6 gate=true ! pyml_whispertranscribe device=cuda language=ko ! fakesink
Transcription
transcription with initial prompt set
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! pyml_whispertranscribe device=cuda language=ko initial_prompt = "Air Traffic Control은, radar systems를, weather conditions에, flight paths를, communication은, unexpected weather conditions가, continuous training을, dedication과, professionalism" ! fakesink
translation to English
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! pyml_whispertranscribe device=cuda language=ko translate=yes ! fakesink
demucs audio separation
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! audioresample ! pyml_demucs device=cuda ! wavenc ! filesink location=separated_vocals.wav
coquitts
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! pyml_whispertranscribe device=cuda language=ko translate=yes ! pyml_coquitts device=cuda ! audioconvert ! wavenc ! filesink location=output_audio.wav
whisperspeechtts
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! pyml_whispertranscribe device=cuda language=ko translate=yes ! pyml_whisperspeechtts device=cuda ! audioconvert ! wavenc ! filesink location=output_audio.wav
mariantranslate
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! pyml_whispertranscribe device=cuda language=ko translate=yes ! pyml_mariantranslate device=cuda src=en target=fr ! fakesink
Supported src/target languages:
https://huggingface.co/models?sort=trending&search=Helsinki
whisperlive
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/air_traffic_korean_with_english.wav ! decodebin ! audioconvert ! pyml_whisperlive device=cuda language=ko translate=yes llm-model-name="microsoft/phi-2" ! audioconvert ! wavenc ! filesink location=output_audio.wav
LLM
-
generate HuggingFace token
-
huggingface-cli loginand pass in token -
LLM pipeline (in this case, we use phi-2)
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/prompt_for_llm.txt ! pyml_llm device=cuda model-name="microsoft/phi-2" ! fakesink
stablediffusion
GST_DEBUG=4 gst-launch-1.0 filesrc location=data/prompt_for_stable_diffusion.txt ! pyml_stablediffusion device=cuda ! pngenc ! filesink location=output_image.png
Caption
caption qwen with history
(should also work with "microsoft/Phi-3.5-vision-instruct" model)
GST_DEBUG=3 gst-launch-1.0 filesrc location=data/soccer_single_camera.mp4 ! decodebin ! videoconvertscale ! video/x-raw,width=640,height=480 ! tee name=t t. ! queue ! textoverlay name=overlay wait-text=false ! videoconvert ! autovideosink t. ! queue leaky=2 max-size-buffers=1 ! videoconvertscale ! video/x-raw,width=240,height=180 ! pyml_caption_qwen device=cuda:0 prompt="In one sentence, describe what you see?" model-name="Qwen/Qwen2.5-VL-3B-Instruct-AWQ" name=cap cap.src ! fakesink async=0 sync=0 cap.text_src ! queue ! coalescehistory history-length=10 ! pyml_llm model-name="Qwen/Qwen3-0.6B" device=cuda system-prompt="You receive the history of what happened in recent times, summarize it nicely with excitement but NEVER mention the specific times. Focus on the most recent events." ! queue ! overlay.text_sink
kafkasink
Setting up kafka network
docker network create kafka-network
and list networks
docker network ls
docker launch
To launch a docker instance with the kafka network, add --network kafka-network
to the docker launch command above.
Set up kafka and zookeeper
Note: setup below assumes you are running your pipeline in a docker container.
If running pipeline from host, then the port changes from 9092 to 29092,
and the broker changes from kafka to localhost.
docker stop kafka zookeeper
docker rm kafka zookeeper
docker run -d --name zookeeper --network kafka-network -e ZOOKEEPER_CLIENT_PORT=2181 confluentinc/cp-zookeeper:latest
docker run -d --name kafka --network kafka-network \
-e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
-e KAFKA_ADVERTISED_LISTENERS=INSIDE://kafka:9092,OUTSIDE://localhost:29092 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT \
-e KAFKA_LISTENERS=INSIDE://0.0.0.0:9092,OUTSIDE://0.0.0.0:29092 \
-e KAFKA_INTER_BROKER_LISTENER_NAME=INSIDE \
-e KAFKA_BROKER_ID=1 \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-p 9092:9092 \
-p 29092:29092 \
confluentinc/cp-kafka:latest
Create test topic
docker exec kafka kafka-topics --create --topic test-kafkasink-topic --bootstrap-server kafka:9092 --partitions 1 --replication-factor 1
list topics
docker exec -it kafka kafka-topics --list --bootstrap-server kafka:9092
delete topic
docker exec -it kafka kafka-topics --delete --topic test-topic --bootstrap-server kafka:9092
consume topic
docker exec -it kafka kafka-console-consumer --bootstrap-server kafka:9092 --topic test-kafkasink-topic --from-beginning
non ML
GST_DEBUG=4 gst-launch-1.0 videotestsrc ! video/x-raw,width=1280,height=720 ! pyml_overlay meta-path=data/sample_metadata.json tracking=true ! videoconvert ! autovideosink
streammux/streamdemux pipeline
GST_DEBUG=4 gst-launch-1.0 videotestsrc pattern=ball ! video/x-raw, width=320, height=240 ! queue ! pyml_streammux name=mux videotestsrc pattern=smpte ! video/x-raw, width=320, height=240 ! queue ! mux.sink_1 videotestsrc pattern=smpte ! video/x-raw, width=320, height=240 ! queue ! mux.sink_2 mux.src ! queue ! pyml_streamdemux name=demux demux.src_0 ! queue ! glimagesink demux.src_1 ! queue ! glimagesink demux.src_2 ! queue ! glimagesink
Segment Anything (SAM)
pyml_sam runs Meta SAM2 for zero-shot segmentation with point, box, or automatic prompts.
Auto-mask segmentation (segment everything)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_sam model-name=facebook/sam2-hiera-small device=cuda prompt-mode=auto \
! videoconvert ! autovideosink sync=false
Point-prompt segmentation (segment object at center)
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_sam model-name=facebook/sam2-hiera-small device=cuda \
prompt-mode=point points="320,240" \
! videoconvert ! autovideosink sync=false
OCR
pyml_ocr performs text detection and recognition using EasyOCR or TrOCR.
EasyOCR text detection (default)
gst-launch-1.0 filesrc location=data/document.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_ocr backend=easyocr languages="en" device=cuda \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
TrOCR recognition
gst-launch-1.0 filesrc location=data/document.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_ocr backend=trocr model-name=microsoft/trocr-base-printed device=cuda \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
Face Detection & Recognition
pyml_face detects faces with RetinaFace and optionally identifies them using ArcFace embeddings.
Face detection only
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_face device=cuda \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
Face detection + recognition with gallery
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_face device=cuda gallery-path=data/face_gallery/ recognition-threshold=0.6 \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
Optical Flow
pyml_optical_flow estimates dense optical flow between consecutive frames using RAFT.
RAFT optical flow with color visualization
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_optical_flow model-name=raft-small device=cuda visualize=true \
! videoconvert ! autovideosink sync=false
Super-Resolution
pyml_superres upscales video frames using Real-ESRGAN.
2x upscale
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=320,height=240" \
! pyml_superres device=cuda scale=2 \
! videoconvert ! autovideosink sync=false
4x upscale with tile processing
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=320,height=240" \
! pyml_superres device=cuda scale=4 tile-size=256 tile-overlap=32 \
! videoconvert ! autovideosink sync=false
Action Recognition
pyml_action classifies activities over sliding temporal windows using SlowFast or X3D.
SlowFast action recognition
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_action model-name=slowfast_r50 device=cuda clip-length=32 \
! videoconvert ! pyml_overlay ! videoconvert ! autovideosink sync=false
Anomaly Detection
pyml_anomaly detects visual anomalies using PatchCore for manufacturing QA.
PatchCore anomaly detection
gst-launch-1.0 filesrc location=data/factory.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_anomaly device=cuda coreset-path=data/coreset.pt threshold=0.5 \
! videoconvert ! autovideosink sync=false
Audio Classification (CLAP)
pyml_clap performs zero-shot audio classification using LAION CLAP.
CLAP audio event detection
gst-launch-1.0 filesrc location=data/audio_sample.wav ! decodebin \
! audioconvert ! audioresample ! audio/x-raw,format=F32LE,rate=48000,channels=1 \
! pyml_clap device=cuda labels="gunshot,siren,baby crying,music,speech" threshold=0.3 \
! fakesink
Vision-Language Model (VLM)
pyml_vlm runs generic VLMs (LLaVA, InternVL, etc.) for visual question answering.
LLaVA visual question answering
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_vlm model-name=llava-hf/llava-1.5-7b-hf device=cuda \
prompt="What is happening in this scene?" \
! fakesink
Embedding Extractor
pyml_embedding extracts dense vector embeddings from video frames.
CLIP embedding extraction
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_embedding model-name=openai/clip-vit-base-patch32 device=cuda \
output-mode=metadata \
! fakesink
DINOv2 embeddings saved to file
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_embedding model-name=facebook/dinov2-base device=cuda \
output-mode=file output-path=embeddings.npy \
! fakesink
Multi-Object Tracker
pyml_tracker is a standalone tracker that works with any upstream detector.
YOLO + standalone SORT tracker
gst-launch-1.0 filesrc location=data/soccer_tracking.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_objectdetector model-name=fasterrcnn_resnet50_fpn device=cuda \
! pyml_tracker tracker-type=sort max-age=30 min-hits=3 iou-threshold=0.3 \
! pyml_overlay ! videoconvert ! autovideosink sync=false
ML Alert
pyml_alert triggers alerts based on upstream detection metadata.
Webhook alert on person detection
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_objectdetector model-name=fasterrcnn_resnet50_fpn device=cuda \
! pyml_alert rules='{"class":"person","min_score":0.8}' \
webhook-url=http://localhost:8080/alert cooldown=10 \
! pyml_overlay ! videoconvert ! autovideosink sync=false
MQTT alert with zone filtering
gst-launch-1.0 filesrc location=data/people.mp4 ! decodebin name=d \
d. ! queue ! videoconvert ! videoscale ! "video/x-raw,width=640,height=480" \
! pyml_yolo model-name=yolo11m device=cuda \
! pyml_alert rules='{"class":"person","min_score":0.7,"zone":[0,0,320,240]}' \
mqtt-broker=localhost:1883 mqtt-topic=alerts/zone1 cooldown=5 \
! pyml_overlay ! videoconvert ! autovideosink sync=false
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gst_python_ml-1.1.1.tar.gz.
File metadata
- Download URL: gst_python_ml-1.1.1.tar.gz
- Upload date:
- Size: 185.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4881b03cb41c1513ff47f68d89431f448c3efba2162b68e5d180491653fadcfc
|
|
| MD5 |
dfde463020c5001391a9d5ba1f18a33b
|
|
| BLAKE2b-256 |
10e44d033368c869ab40577ad58521df61a60834990d8b8fb14e3cffb379f957
|
File details
Details for the file gst_python_ml-1.1.1-py3-none-any.whl.
File metadata
- Download URL: gst_python_ml-1.1.1-py3-none-any.whl
- Upload date:
- Size: 105.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbae0c5de02d8085bf19129318974ec363a050edca07a8009998c264a8f26760
|
|
| MD5 |
9514d66c720817390cf6224bd4b3efdb
|
|
| BLAKE2b-256 |
2c4c77451f39720758cde9e5a37c653510e277a92f4bd8e1dfc2a0b16288d1cb
|