Whole-body pose estimation with CIGPose and ONNX Runtime
Project description
CIGPose ONNX Runtime
Whole-body pose estimation with ONNX Runtime. Single pip install, no PyTorch or MMPose required.
Pre-exported ONNX models and a lightweight inference CLI for CIGPose (67.5 Whole AP on COCO-WholeBody).
CIGPose by 53mins. Model weights come from the original training pipeline built on MMPose. This repo is just the ONNX conversion and inference wrapper.
Why CIGPose?
Pose estimators tend to get confused by visual context (a hand near a coffee cup, a shoulder behind another person). CIGPose frames this as a causal inference problem:
-
Structural Causal Model - visual context is a confounder that creates a backdoor path between image features and pose predictions. CIGPose targets P(Y|do(F)) instead of P(Y|F).
-
Causal Intervention Module (CIM) - figures out which keypoint embeddings are confused by measuring predictive uncertainty, then swaps them for learned context-invariant canonical embeddings.
-
Hierarchical Graph Neural Network - enforces anatomical plausibility through local (intra-part) and global (inter-part) message passing over the skeleton graph.
This gives you fewer anatomically impossible predictions, especially under occlusion and clutter.
Results
(a) CIGPose on COCO-WholeBody val. (b) Side-by-side with RTMPose-x: the baseline hallucinates limbs into background clutter, CIGPose doesn't.
Left to right: input, RTMPose-x, CIGPose-x.
Installation
pip install cigpose-onnx
For GPU inference:
pip install cigpose-onnx[gpu]
Or install from source:
git clone https://github.com/namas191297/cigpose-onnx.git
cd cigpose-onnx
pip install .
This gives you both the cigpose CLI command and the ability to run python run_onnx.py directly.
Quick Start
Download models from the Releases page:
wget https://github.com/namas191297/cigpose-onnx/releases/latest/download/cigpose_models.zip
unzip cigpose_models.zip -d models/
Run inference:
# image
cigpose --model models/cigpose-m_coco-wholebody_256x192.onnx \
--detector models/yolox_nano.onnx --image photo.jpg
# video
cigpose --model models/cigpose-x_coco-ubody_384x288.onnx \
--detector models/yolox_nano.onnx --video clip.mp4 -o result.mp4
# webcam (q = quit)
cigpose --model models/cigpose-m_coco-wholebody_256x192.onnx \
--detector models/yolox_nano.onnx --webcam
# webcam + record to file
cigpose --model models/cigpose-m_coco-wholebody_256x192.onnx \
--detector models/yolox_nano.onnx --webcam -o recording.mp4
# headless server (no display window)
cigpose --model models/cigpose-x_coco-ubody_384x288.onnx \
--detector models/yolox_nano.onnx --video clip.mp4 -o out.mp4 --no-display
Omitting --detector treats the full frame as one person (useful for pre-cropped inputs).
If you installed from source with pip install ., you can also use python run_onnx.py with the same arguments instead of the cigpose command.
CLI Options
cigpose --help
| Flag | Default | Description |
|---|---|---|
--model |
required | CIGPose ONNX model path |
--detector |
none | YOLOX ONNX detector path |
--image |
- | Input image path |
--video |
- | Input video path |
--webcam |
- | Use webcam as input |
--cam-id |
0 | Webcam device index |
--output, -o |
auto | Output path (auto-generated if omitted) |
--no-display |
off | Skip display windows (headless mode, requires --output) |
--threshold |
0.6 | Min keypoint confidence to draw |
--det-threshold |
0.5 | Person detection confidence |
--det-nms |
0.45 | Detection NMS IoU |
--device |
cpu | cpu or cuda |
--version |
- | Print version and exit |
Python API
The package exposes three levels of API depending on how much control you need.
High-level - detect + pose estimate + draw in one call:
from cigpose import load_pose_model, YOLOXDetector, infer_persons, draw_bboxes
import cv2
sess, input_w, input_h, split_ratio = load_pose_model('models/cigpose-x_coco-ubody_384x288.onnx')
detector = YOLOXDetector('models/yolox_nano.onnx')
frame = cv2.imread('photo.jpg')
bboxes = detector.detect(frame)
vis = infer_persons(sess, frame, bboxes, input_w, input_h, split_ratio, threshold=0.6)
draw_bboxes(vis, bboxes)
cv2.imwrite('result.jpg', vis)
Mid-level - use the run functions directly (handles I/O for you):
from cigpose import load_pose_model, YOLOXDetector, run_on_image, run_on_video
sess, input_w, input_h, split_ratio = load_pose_model('models/cigpose-x_coco-ubody_384x288.onnx')
detector = YOLOXDetector('models/yolox_nano.onnx')
# image
run_on_image(sess, 'photo.jpg', input_w, input_h, split_ratio,
'result.jpg', threshold=0.6, detector=detector, show=False)
# video
run_on_video(sess, 'clip.mp4', input_w, input_h, split_ratio,
'result.mp4', threshold=0.6, detector=detector, show=False)
Low-level - full control over each step:
from cigpose import (
load_pose_model, YOLOXDetector,
preprocess_person, decode_simcc, remap_to_frame, draw_pose,
)
import cv2
sess, input_w, input_h, split_ratio = load_pose_model('models/cigpose-x_coco-ubody_384x288.onnx')
detector = YOLOXDetector('models/yolox_nano.onnx')
frame = cv2.imread('photo.jpg')
bboxes = detector.detect(frame)
for bbox in bboxes:
# crop, resize, normalize
tensor, crop_region = preprocess_person(frame, bbox, input_w, input_h)
# run pose model
simcc_x, simcc_y = sess.run(None, {'input': tensor})
# decode SimCC outputs to keypoints + confidence scores
# kpts: (K, 2) pixel coords in model-input space
# scores: (K,) raw logit confidence per keypoint
kpts, scores = decode_simcc(simcc_x, simcc_y, input_w, input_h, split_ratio)
# map back to original frame coordinates
kpts = remap_to_frame(kpts, crop_region, input_w, input_h)
# draw (or do whatever you want with kpts/scores)
draw_pose(frame, kpts, scores, threshold=0.6)
cv2.imwrite('result.jpg', frame)
Available exports:
| Function | Description |
|---|---|
load_pose_model(path, providers) |
Load ONNX model, returns (session, input_w, input_h, split_ratio) |
YOLOXDetector(path, ...) |
Person detector, .detect(frame) returns [[x1,y1,x2,y2], ...] |
preprocess_person(frame, bbox, w, h) |
Crop + normalize, returns (tensor, crop_region) |
decode_simcc(simcc_x, simcc_y, w, h, ratio) |
Decode SimCC logits, returns (keypoints, scores) |
remap_to_frame(kpts, crop_region, w, h) |
Map keypoints back to original frame coords |
draw_pose(frame, kpts, scores, threshold) |
Draw skeleton + keypoints on frame |
draw_bboxes(frame, bboxes) |
Draw detection bounding boxes |
infer_persons(sess, frame, bboxes, ...) |
Full per-person inference loop, returns annotated frame |
run_on_image(sess, path, ...) |
End-to-end image inference with I/O |
run_on_video(sess, path, ...) |
End-to-end video inference with I/O |
run_on_webcam(sess, ...) |
End-to-end webcam inference |
COCO133_SKELETON |
Skeleton connectivity for 133 whole-body keypoints |
COCO17_SKELETON |
Skeleton connectivity for 17 body keypoints |
CROWDPOSE14_SKELETON |
Skeleton connectivity for 14 CrowdPose keypoints |
Model Zoo
Each ONNX file has input size, normalization constants, and split ratio embedded as metadata. No sidecar configs.
COCO-WholeBody v1.0 val (133 keypoints)
| Model | Input Size | GFLOPs | Body AP | Foot AP | Face AP | Hand AP | Whole AP | Size |
|---|---|---|---|---|---|---|---|---|
| CIGPose-m | 256x192 | 2.3 | 69.0 | 64.3 | 82.1 | 49.7 | 59.9 | 71 MB |
| CIGPose-l | 256x192 | 4.6 | 71.2 | 69.0 | 83.3 | 54.0 | 62.6 | 131 MB |
| CIGPose-l | 384x288 | 10.7 | 73.0 | 72.0 | 88.3 | 59.8 | 66.3 | 142 MB |
| CIGPose-x | 384x288 | 18.7 | 73.5 | 72.3 | 88.1 | 60.2 | 67.0 | 230 MB |
| CIGPose-l +UBody | 256x192 | 4.6 | 71.3 | 66.2 | 83.4 | 55.5 | 63.1 | 131 MB |
| CIGPose-l +UBody | 384x288 | 10.7 | 73.1 | 72.3 | 88.0 | 61.2 | 66.9 | 142 MB |
| CIGPose-x +UBody | 384x288 | 18.7 | 73.5 | 70.3 | 88.4 | 62.6 | 67.5 | 230 MB |
COCO val2017 (17 body keypoints)
| Model | Input Size | GFLOPs | Params | AP | AR | Size |
|---|---|---|---|---|---|---|
| CIGPose-m | 256x192 | 1.9 | 14M | 76.6 | 79.3 | 54 MB |
| CIGPose-l | 256x192 | 4.2 | 28M | 77.6 | 80.3 | 108 MB |
| CIGPose-l | 384x288 | 9.4 | 29M | 78.5 | 81.1 | 109 MB |
CrowdPose test (14 keypoints)
| Model | Input Size | Params | AP | AP easy | AP med | AP hard | Size |
|---|---|---|---|---|---|---|---|
| CIGPose-m | 256x192 | 14.4M | 71.4 | 81.0 | 72.7 | 58.9 | 54 MB |
| CIGPose-l | 256x192 | 28.4M | 73.7 | 82.8 | 75.1 | 61.2 | 108 MB |
| CIGPose-l | 384x288 | 28.8M | 74.2 | 82.9 | 75.6 | 62.5 | 109 MB |
| CIGPose-x | 384x288 | 50.4M | 75.8 | 84.2 | 77.3 | 63.6 | 191 MB |
Person Detector
| Model | License | Input | Size |
|---|---|---|---|
| YOLOX-Nano | Apache 2.0 | 416x416 | 3.5 MB |
Picking a model
| Use case | Recommended | Why |
|---|---|---|
| Best accuracy | cigpose-x_coco-ubody_384x288 |
67.5 Whole AP, trained on extra UBody data |
| Balanced | cigpose-l_coco-wholebody_384x288 |
66.3 AP, ~40% smaller than x |
| Lightweight / real-time | cigpose-m_coco-wholebody_256x192 |
71 MB, fastest |
| Body keypoints only | cigpose-l_coco_384x288 |
78.5 AP, 17 standard COCO keypoints |
| Crowded scenes | cigpose-x_crowdpose_384x288 |
Trained on CrowdPose, handles overlap |
Swapping the Detector
YOLOX-Nano is included as the default detector, but you can use anything that gives you person bounding boxes.
Drop-in YOLOX upgrade
Any YOLOX variant (Tiny/S/M/L/X) from the YOLOX repo works with the existing YOLOXDetector class. Just point --detector at the larger ONNX file.
Custom detector
Implement a class with a detect(frame) -> list[[x1,y1,x2,y2]] method:
class MyDetector:
def __init__(self, model_path, providers=None):
self.session = ort.InferenceSession(model_path, providers=providers or ['CPUExecutionProvider'])
def detect(self, frame):
# frame: BGR numpy (H, W, 3)
# return: list of [x1, y1, x2, y2] in pixel coords
...
Then use it via the Python API or wire it up in cli.py.
Pre-computed boxes
If you already have bounding boxes from a tracker or annotation file, skip the detector entirely. Omit --detector and feed pre-cropped single-person images.
License note
If you want to keep your project permissively licensed, stick to detectors under Apache 2.0 (YOLOX, RT-DETR) or MIT (NanoDet). Ultralytics YOLO is AGPL-3.0.
How It Works
Standard top-down pipeline:
- Detect - YOLOX finds person bounding boxes
- Crop - each person is cropped with 1.25x padding, aspect-ratio-corrected, resized to model input
- Infer - CIGPose predicts SimCC coordinate classifications (one distribution per keypoint per axis)
- Decode - argmax gives the coordinate, raw logit peak gives confidence
- Remap - coordinates mapped back to the original frame
Model metadata (input dimensions, normalization constants, split ratio) is embedded in each ONNX file.
Acknowledgements
- CIGPose by 53mins - model architectures, training pipeline, and all checkpoint weights.
- MMPose (OpenMMLab) - the pose estimation framework CIGPose is built on.
- YOLOX (Megvii) - Apache 2.0 object detector used here for person detection.
Author
Namas Bhandari - namas.brd@gmail.com
ONNX conversion, runtime wrapper, and this repository.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cigpose_onnx-1.0.0.tar.gz.
File metadata
- Download URL: cigpose_onnx-1.0.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
129773775949b0d4acac8bf95af512742a966e2ad57e1c55bda59524b3529837
|
|
| MD5 |
2c2f66881dde71ff7abbab5c09d02d48
|
|
| BLAKE2b-256 |
03d9f0b2281273374985b7305f9745c3cdc3a0067a7c4022ab583f9c7115ba21
|
File details
Details for the file cigpose_onnx-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cigpose_onnx-1.0.0-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5b24f88e9d038acba101a1022adec95b7342c57908b573ed4d8354f919df5fa
|
|
| MD5 |
14fa4c1f9ab7befc5135e077085af960
|
|
| BLAKE2b-256 |
b504ef63af1730a583aa459e1b2eb40a9c9181f00186da1fbb6422a1e8009566
|