Strands @tool wrappers for Meta Sapiens2 - high-res human-centric vision (seg, normal, pose, pointmap, pretrain backbone)
Project description
Wraps Meta's Sapiens2 - a family of high-resolution vision transformers pretrained on 1 billion human images - as first-class Strands Agents tools.
Real output: Input → 29-class segmentation → surface normals (0.4b model, NVIDIA Thor)
Every tool returns the standard Strands ToolResult format (status + content list with text, json, and inline image blocks), so the agent can read structured data and see visual output in a single call.
Tools
| Tool | What it does | Model sizes |
|---|---|---|
sapiens_seg |
29-class body-part segmentation | 0.4b · 0.8b · 1b · 5b |
sapiens_normal |
Per-pixel surface-normal estimation | 0.4b · 0.8b · 1b · 5b |
sapiens_albedo |
Intrinsic color (illumination-invariant) estimation | 0.4b · 0.8b · 1b · 5b |
sapiens_pointmap |
3D pointmap - lifts each pixel to camera-space XYZ | 0.4b · 0.8b · 1b · 5b |
sapiens_pose |
308-keypoint 2D pose (face + body + hands + feet) | 0.4b · 0.8b · 1b · 5b |
sapiens_backbone |
Raw pretrained backbone features | 0.1b · 0.4b · 0.8b · 1b · 1b_4k · 5b |
sapiens_info |
Inspect local checkpoints, CUDA status, env | - |
sapiens_video |
Frame-by-frame video processing (any dense task) | 0.4b · 0.8b · 1b · 5b |
Install
pip install strands-sapiens
Prerequisites
# 1. CUDA-enabled PyTorch (platform-specific)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# 2. Sapiens2 from source
pip install git+https://github.com/facebookresearch/sapiens2.git
# 3. Download checkpoints (see upstream MODEL_ZOO)
# Default location: ~/sapiens2_host (override with $SAPIENS_CHECKPOINT_ROOT)
Expected checkpoint layout
~/sapiens2_host/
├── pretrain/ sapiens2_{0.1b,0.4b,0.8b,1b,1b_4k,5b}_pretrain.safetensors
├── seg/ sapiens2_{0.4b,0.8b,1b,5b}_seg.safetensors
├── normal/ sapiens2_{0.4b,0.8b,1b,5b}_normal.safetensors
├── albedo/ sapiens2_{0.4b,0.8b,1b,5b}_albedo.safetensors
├── pointmap/ sapiens2_{0.4b,0.8b,1b,5b}_pointmap.safetensors
├── pose/ sapiens2_{0.4b,0.8b,1b,5b}_pose.safetensors
└── detector/ detr-resnet-101-dc5/ (DETR from HuggingFace)
Override with:
export SAPIENS_CHECKPOINT_ROOT=/data/sapiens2_host
Quick start
With a Strands agent
from strands import Agent
from strands_sapiens import TOOLS
agent = Agent(tools=TOOLS)
# Natural language → the agent picks the right tool
agent("Segment every person in /data/photos and save to /data/out")
agent("Estimate surface normals for photo.jpg using the 1b model")
agent("What checkpoints do I have installed?")
Cherry-pick individual tools
from strands import Agent
from strands_sapiens import sapiens_seg, sapiens_pose
agent = Agent(tools=[sapiens_seg, sapiens_pose])
agent("Run pose estimation on /tmp/input/dancer.jpg, save to /tmp/out")
Direct Python call (no agent)
Every tool is a regular Python function:
from strands_sapiens import sapiens_seg
result = sapiens_seg(
input_path="human.jpg",
output_dir="./out",
model_size="0.4b",
save_pred=True,
)
print(result["status"]) # "success"
Response format
All tools return the standard Strands ToolResult format:
{
"status": "success", # or "error"
"content": [
{"text": "seg complete on 3 image(s)"}, # summary
{"image": {"format": "jpeg", "source": {"bytes": b"..."}}}, # inline vis (up to 5)
{"json": { # structured data
"task": "seg",
"model_size": "0.4b",
"outputs": [
{"input": "/data/human.jpg", "vis": "/out/human.jpg", "pred": "/out/human_seg.npy"}
]
}}
]
}
This means the agent can:
- Read the text summary
- See the visualization images inline (same format as
strands_tools.image_reader) - Parse the structured JSON for downstream tool chaining
On error, content contains a text message and optionally a json block with traceback.
Verified environments
| Platform | PyTorch | Checkpoints tested |
|---|---|---|
| NVIDIA Thor (JetPack 6, aarch64) | 2.7+ | 0.1b pretrain, 0.4b seg |
| Ubuntu 22.04 x86_64 | 2.4+ | 0.4b seg/normal/pose |
Python ≥ 3.10 required. JetPack 6 ships 3.10 by default.
Development
git clone https://github.com/cagataycali/strands-sapiens.git
cd strands-sapiens
pip install -e '.[dev]'
pytest -q
Smoke tests do not require CUDA, GPU, or checkpoints.
Troubleshooting
| Error | Fix |
|---|---|
Missing checkpoint: ... |
Your $SAPIENS_CHECKPOINT_ROOT is missing the file. Run sapiens_info() to see what's present. |
No config found for task=... |
Installed sapiens version doesn't match expected config paths. The wrapper tries rglob as fallback - if that fails too, open an issue with pip show sapiens output. |
sapiens.pose high-level API not available |
Your sapiens2 build lacks sapiens.pose.inference.Inferencer. The error message shows how to run the upstream CLI script directly. |
License
- This wrapper: MIT
- Sapiens2 models & code: Sapiens2 License (Meta)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_sapiens-0.1.2.tar.gz.
File metadata
- Download URL: strands_sapiens-0.1.2.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10931468b5ba4175b8deb68105112e3582c7000c950dacacf8a6ff9a68c731bc
|
|
| MD5 |
e17c399e5c7a5bd2edb631b12c56b675
|
|
| BLAKE2b-256 |
eb0c3063ddee654cdf332f2f6da8e42b8e568d5ab39b6a116f7f247c064a31b7
|
File details
Details for the file strands_sapiens-0.1.2-py3-none-any.whl.
File metadata
- Download URL: strands_sapiens-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd009084837cecd69958f43c2ee47ecf36d5f8e92a1eb17cc1caa2a11d2bb070
|
|
| MD5 |
893fe55cd54329139a88ef75ab909772
|
|
| BLAKE2b-256 |
e91e134f4d727b1075ba2d4fa839acab28bef9c49b7601ae004f40e15fb26e6a
|