3D human pose estimation from video using DiffPose + MixSTE

These details have not been verified by PyPI

Project description

DiffPose-Video

3D human pose estimation from arbitrary video using MixSTE (2D→3D lifting) and DiffPose (diffusion-based refinement).

This package wraps the original DiffPose research code with a clean inference pipeline, an interactive visualisation dashboard, and a video renderer — all accessible as CLI commands after a single pip install.

Paper: DiffPose: Toward More Reliable 3D Pose Estimation, CVPR 2023.

Install

# 1. Install PyTorch for your CUDA version first (example: CUDA 12.8)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# 2. Install this package
pip install diffpose-video

Note: onnxruntime-gpu==1.20.1 is pinned because later versions have a broken CUDA provider on some systems.

Download pretrained checkpoints

diffpose-download

Downloads all pretrained weights to ~/.cache/diffpose_video/checkpoints/. Safe to re-run — skips files that already exist.

Usage

All three commands accept either CLI flags or a --config <file>.toml. Example TOML files are in configs/ — copy and edit one with your own paths before use. These are separate from the bundled model config (diffpose_video/configs/*.yml) which is handled automatically.

1. Run inference

Single video:

diffpose-infer --input video.mp4 --output_dir results/

Batch (whole directory):

diffpose-infer --input /datasets/videos/ --recursive --skip_existing --output_dir results/

Via config file:

diffpose-infer --config infer.toml

Output: results/<video_name>.npz containing:

poses_3d — (T, 17, 3) root-relative 3D joint positions
keypoints_2d — (T, 17, 3) pixel-space 2D detections + confidence

Model and checkpoint paths default to the bundled config and ~/.cache/diffpose_video/checkpoints/. Override with --model_config, --model_pose, --model_diff if needed.

Key options:

Flag	Default	Description
`--input`	—	Video file(s) and/or directory paths
`--output_dir`	`results/`	Root output directory
`--recursive`	off	Recurse into subdirectories
`--skip_existing`	off	Skip if `.npz` already exists
`--exclude PATTERN`	—	Exclude files matching pattern(s)
`--device`	`cuda`	`cuda` or `cpu`
`--det_freq N`	`1`	Run person detector every N frames

2. Interactive dashboard

Single video:

diffpose-explore --npz results/video.npz --video video.mp4

Multi-video (browse & compare):

diffpose-explore --results_dir results/ --videos_dir /path/to/videos/

Multi-camera with directory mapping:

diffpose-explore \
  --results_dir results/ \
  --videos_map Cam1:/data/Cam1/InputMedia Cam2:/data/Cam2/InputMedia \
  --videos_dir /data/default_videos/

Via config file (recommended for multi-camera setups):

diffpose-explore --config explore.toml

# explore.toml
results_dir = "results"
output_dir  = "visualisations"
fps         = 30.0
port        = 8050

[videos]
default = "/data/default_videos"
Cam1    = "/data/Cam1/InputMedia"
Cam2    = "/data/Cam2/InputMedia"

Opens a Plotly Dash app at http://localhost:8050 with:

Synchronized video playback with 2D skeleton overlay
Side-by-side A/B comparison of any two results
Animated 3D skeleton
X / Y / Z trajectory graphs per joint, with joint selector
Play/pause + frame scrubber, all linked
"Render" buttons to generate side-by-side MP4s on demand

3. Render side-by-side MP4

Single video:

diffpose-visualise \
  --npz results/video.npz --video video.mp4 --output results/video_vis.mp4

Batch (whole results directory):

diffpose-visualise \
  --results_dir results/ --videos_dir /path/to/videos/ --output_dir visualisations/

Via config file:

diffpose-visualise --config visualise.toml

Produces a video with the original footage (+ 2D overlay) on the left and the animated 3D skeleton on the right.

Key options:

Flag	Default	Description
`--skip_existing`	off	Skip if output MP4 already exists
`--fps`	source FPS	Output frame rate
`--start` / `--end`	full range	Frame range to render
`--azim`	`70`	Initial 3D camera azimuth (degrees)

Docker

The image is self-contained — checkpoints and config are baked in at build time.

# Build once
docker compose build

Set paths via environment variables (or export them):

export VIDEOS_DIR=/path/to/your/videos
export RESULTS_DIR=/path/to/your/results
export VIS_DIR=/path/to/your/visualisations
export CONFIGS_DIR=/path/to/your/toml/configs   # optional

Inference:

# Single video
docker compose run infer --input /videos/clip.mp4

# Batch — whole directory, recursive
docker compose run infer --input /videos/ --recursive --skip_existing

# Via config file
docker compose run infer --config /configs/infer.toml

Render side-by-side MP4:

# Single video
docker compose run visualise \
  --npz /results/clip.npz --video /videos/clip.mp4 --output /vis/clip_vis.mp4

# Batch
docker compose run visualise --results_dir /results --videos_dir /videos

# Via config file
docker compose run visualise --config /configs/visualise.toml

Interactive dashboard — open http://localhost:8050:

docker compose run --service-ports explore \
  --results_dir /results --videos_dir /videos

# Multi-camera via config file
docker compose run --service-ports explore --config /configs/explore.toml

Citation

@InProceedings{gong2023diffpose,
    author    = {Gong, Jia and Foo, Lin Geng and Fan, Zhipeng and Ke, Qiuhong and Rahmani, Hossein and Liu, Jun},
    title     = {DiffPose: Toward More Reliable 3D Pose Estimation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Apr 19, 2026

0.2.1

Apr 19, 2026

0.2.0

Apr 19, 2026

0.1.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffpose_video-0.3.0.tar.gz (76.3 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diffpose_video-0.3.0-py3-none-any.whl (85.2 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file diffpose_video-0.3.0.tar.gz.

File metadata

Download URL: diffpose_video-0.3.0.tar.gz
Upload date: Apr 19, 2026
Size: 76.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for diffpose_video-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`2be52dd0c6d2f4eeb8cc6f8e6c464e28c3cd2ecdfee355b61c7fce240d40364e`
MD5	`d8ea3f046da44ee36fb7e833ad30ae53`
BLAKE2b-256	`045ab1194a894d4bb264309babb2a170cec29857c21ecefb903ce17813ccdf43`

See more details on using hashes here.

File details

Details for the file diffpose_video-0.3.0-py3-none-any.whl.

File metadata

Download URL: diffpose_video-0.3.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 85.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for diffpose_video-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a3c5aa238cc1c49845cd6a4993ba0348bd1fe6c3083afdee50a3c7340b0cb58`
MD5	`7443c6d5fa85704db85c89c5cf2b8da3`
BLAKE2b-256	`07dfdc657543eaab146d6750852938d5a76f623c609913e1077cce7854bafba3`

See more details on using hashes here.

diffpose-video 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DiffPose-Video

Install

Download pretrained checkpoints

Usage

1. Run inference

2. Interactive dashboard

3. Render side-by-side MP4

Docker

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes