Skip to main content

Strands @tool wrappers for Meta Sapiens2 - high-res human-centric vision (seg, normal, pose, pointmap, pretrain backbone)

Project description

Strands Sapiens

strands-sapiens

Give your agent a body. Pixel-perfect human understanding, as Strands tools.

PyPI Python CI Docs GitHub License


Wraps Meta's Sapiens2 - a family of high-resolution vision transformers pretrained on 1 billion human images - as first-class Strands Agents tools.

Input → Segmentation → Normals
Real output: Input → 29-class segmentation → surface normals (0.4b model, NVIDIA Thor)

Every tool returns the standard Strands ToolResult format (status + content list with text, json, and inline image blocks), so the agent can read structured data and see visual output in a single call.

Tools

Tool What it does Model sizes
sapiens_seg 29-class body-part segmentation 0.4b · 0.8b · 1b · 5b
sapiens_normal Per-pixel surface-normal estimation 0.4b · 0.8b · 1b · 5b
sapiens_albedo Intrinsic color (illumination-invariant) estimation 0.4b · 0.8b · 1b · 5b
sapiens_pointmap 3D pointmap - lifts each pixel to camera-space XYZ 0.4b · 0.8b · 1b · 5b
sapiens_pose 308-keypoint 2D pose (face + body + hands + feet) 0.4b · 0.8b · 1b · 5b
sapiens_backbone Raw pretrained backbone features 0.1b · 0.4b · 0.8b · 1b · 1b_4k · 5b
sapiens_info Inspect local checkpoints, CUDA status, env -
sapiens_video Frame-by-frame video processing (any dense task) 0.4b · 0.8b · 1b · 5b

Install

pip install strands-sapiens

Prerequisites

# 1. CUDA-enabled PyTorch (platform-specific)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

# 2. Sapiens2 from source
pip install git+https://github.com/facebookresearch/sapiens2.git

# 3. Download checkpoints (see upstream MODEL_ZOO)
#    Default location: ~/sapiens2_host (override with $SAPIENS_CHECKPOINT_ROOT)
Expected checkpoint layout
~/sapiens2_host/
├── pretrain/  sapiens2_{0.1b,0.4b,0.8b,1b,1b_4k,5b}_pretrain.safetensors
├── seg/       sapiens2_{0.4b,0.8b,1b,5b}_seg.safetensors
├── normal/    sapiens2_{0.4b,0.8b,1b,5b}_normal.safetensors
├── albedo/    sapiens2_{0.4b,0.8b,1b,5b}_albedo.safetensors
├── pointmap/  sapiens2_{0.4b,0.8b,1b,5b}_pointmap.safetensors
├── pose/      sapiens2_{0.4b,0.8b,1b,5b}_pose.safetensors
└── detector/  detr-resnet-101-dc5/              (DETR from HuggingFace)

Override with:

export SAPIENS_CHECKPOINT_ROOT=/data/sapiens2_host

Quick start

With a Strands agent

from strands import Agent
from strands_sapiens import TOOLS

agent = Agent(tools=TOOLS)

# Natural language → the agent picks the right tool
agent("Segment every person in /data/photos and save to /data/out")
agent("Estimate surface normals for photo.jpg using the 1b model")
agent("What checkpoints do I have installed?")

Cherry-pick individual tools

from strands import Agent
from strands_sapiens import sapiens_seg, sapiens_pose

agent = Agent(tools=[sapiens_seg, sapiens_pose])
agent("Run pose estimation on /tmp/input/dancer.jpg, save to /tmp/out")

Direct Python call (no agent)

Every tool is a regular Python function:

from strands_sapiens import sapiens_seg

result = sapiens_seg(
    input_path="human.jpg",
    output_dir="./out",
    model_size="0.4b",
    save_pred=True,
)
print(result["status"])  # "success"

Response format

All tools return the standard Strands ToolResult format:

{
    "status": "success",          # or "error"
    "content": [
        {"text": "seg complete on 3 image(s)"},          # summary
        {"image": {"format": "jpeg", "source": {"bytes": b"..."}}},  # inline vis (up to 5)
        {"json": {                                        # structured data
            "task": "seg",
            "model_size": "0.4b",
            "outputs": [
                {"input": "/data/human.jpg", "vis": "/out/human.jpg", "pred": "/out/human_seg.npy"}
            ]
        }}
    ]
}

This means the agent can:

  • Read the text summary
  • See the visualization images inline (same format as strands_tools.image_reader)
  • Parse the structured JSON for downstream tool chaining

On error, content contains a text message and optionally a json block with traceback.

Verified environments

Platform PyTorch Checkpoints tested
NVIDIA Thor (JetPack 6, aarch64) 2.7+ 0.1b pretrain, 0.4b seg
Ubuntu 22.04 x86_64 2.4+ 0.4b seg/normal/pose

Python ≥ 3.10 required. JetPack 6 ships 3.10 by default.

Development

git clone https://github.com/cagataycali/strands-sapiens.git
cd strands-sapiens
pip install -e '.[dev]'
pytest -q

Smoke tests do not require CUDA, GPU, or checkpoints.

Troubleshooting

Error Fix
Missing checkpoint: ... Your $SAPIENS_CHECKPOINT_ROOT is missing the file. Run sapiens_info() to see what's present.
No config found for task=... Installed sapiens version doesn't match expected config paths. The wrapper tries rglob as fallback - if that fails too, open an issue with pip show sapiens output.
sapiens.pose high-level API not available Your sapiens2 build lacks sapiens.pose.inference.Inferencer. The error message shows how to run the upstream CLI script directly.

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_sapiens-0.1.2.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_sapiens-0.1.2-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file strands_sapiens-0.1.2.tar.gz.

File metadata

  • Download URL: strands_sapiens-0.1.2.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for strands_sapiens-0.1.2.tar.gz
Algorithm Hash digest
SHA256 10931468b5ba4175b8deb68105112e3582c7000c950dacacf8a6ff9a68c731bc
MD5 e17c399e5c7a5bd2edb631b12c56b675
BLAKE2b-256 eb0c3063ddee654cdf332f2f6da8e42b8e568d5ab39b6a116f7f247c064a31b7

See more details on using hashes here.

File details

Details for the file strands_sapiens-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for strands_sapiens-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cd009084837cecd69958f43c2ee47ecf36d5f8e92a1eb17cc1caa2a11d2bb070
MD5 893fe55cd54329139a88ef75ab909772
BLAKE2b-256 e91e134f4d727b1075ba2d4fa839acab28bef9c49b7601ae004f40e15fb26e6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page