Skip to main content

The universal entrypoint to HuggingFace diffusers for Strands agents — 100% pipeline & modality coverage, zero hardcoding. Special focus on Physical-AI world-foundation models (Cosmos) with robot action outputs.

Project description

strands-diffusers

strands-diffusers — one tool, 300+ diffusion pipelines, every modality

The universal entrypoint to HuggingFace diffusers for Strands agents. One tool — use_diffusers — wraps the whole library with zero hardcoding: discover and run any of its 300+ pipelines across every modality. It's a visual library, so here's what it actually produces — every asset below is real model output, not a placeholder:

text → image

any of 108 image pipelines
text → video

LTX · Wan · CogVideoX · Hunyuan
robot actions 🤖

Cosmos WFM: world video + actions
text → audio

StableAudio · AudioLDM2
text / image / video / robot-state  IN
image / video / audio / actions / 3d  OUT

The registry is built at runtime from diffusers._import_structure, so new pipelines are supported automatically with no code change. Same philosophy as use_aws, use_lerobot, and use_transformers: discover, don't hardcode.

3D mesh

ShapE - verts/faces to .ply
audio (hear the .wav)

StableAudio - waveform to .wav

100% coverage, zero hardcoding

Every pipeline, model, and scheduler diffusers ships is reachable through one tool. When diffusers adds a new pipeline, use_diffusers exposes it immediately.

Physical-AI: world-foundation models with action outputs

Cosmos world rollout


"Put the pot to the left of the purple item."

"Pick up the cloth and place it in the bowl."

"Open the drawer and place the spoon inside."

Same robot, same first observation — different task prompt → different imagined world and different predicted actions. Five real rollouts + all three Cosmos action modes in the WFM gallery.

This is the headline. A Cosmos action-policy rollout predicts both a future world video and the robot action chunk that produces it. One use_diffusers(action="run", ...) returns a .mp4 world video, a .json action chunk (normalized [-1, 1], shape [num_chunks, T, action_dim]), and optional .wav sound — and you can see the motion:

time-series (every dim, gripper highlighted)
end-effector path (dims 0–2)

Verified end-to-end on NVIDIA Thor (nvidia/Cosmos3-Nano, bf16/cuda): one call produced a world video (17, 480, 640, 3) and an action chunk (1, 16, 10). See examples/cosmos_action_policy.py.

Install

pip install -e .
pip install -e ".[video,audio]"   # mp4 export, wav I/O

Quick start

from strands import Agent
from strands_diffusers import use_diffusers

agent = Agent(tools=[use_diffusers])
agent("Generate an image of a robot arm in a kitchen")
agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")

Direct:

use_diffusers(action="run", pipeline="StableDiffusionPipeline",
              model="stabilityai/stable-diffusion-2-1",
              parameters={"prompt": "a robot arm in a kitchen"})
# -> {"artifacts": ["/tmp/strands_diffusers/image_*.png"]}

Two layers

run loads a pipeline via from_pretrained and calls it; inputs are coerced (path / URL / base64 to PIL / video), outputs auto-saved and returned by path.

call resolves and calls any diffusers class, function, or method (schedulers, VAEs, CosmosActionCondition, utils). cached:key references resolve to live objects; "**" unpacks a cached mapping into kwargs.

use_diffusers(action="call", target="CosmosActionCondition",
              parameters={"mode": "policy", "video": "robot.mp4"}, cache_key="cond")
use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
              parameters={"prompt": "...", "action": "cached:cond"},
              dtype="bfloat16", device="cuda")

Discovery

action returns
pipelines / models / schedulers classes + derived modality
tasks / modalities / wfm task maps / modality groups / world-foundation models
pipeline_info / inspect signature + docs
visualize action chunk to plots + animation
cache / clear_cache manage loaded pipelines

Architecture

core/registry.py  zero-hardcode taxonomy from diffusers._import_structure
core/engine.py    load/cache pipelines, auto device+dtype
core/io.py        coerce inputs; serialize video/image/audio/action/mesh
core/viz.py       render robot action chunks to plots + animation
tools/use_diffusers.py  the single @tool: run + call + discovery

Testing

pip install -e ".[video,audio,dev]"
pytest tests/ -q          # unit tests, no GPU, no downloads
python examples/smoke.py  # E2E gate on tiny fixtures

Every visual in this README and the docs is produced by real use_diffusers calls — regenerate them with:

python examples/generate_docs_assets.py

Docs

📖 cagataycali.github.io/strands-diffusers — quickstart, full gallery (images / video / audio / actions / 3D), the world-foundation-model story, discovery, and the two-layer design.

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_diffusers-0.3.0.tar.gz (13.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_diffusers-0.3.0-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file strands_diffusers-0.3.0.tar.gz.

File metadata

  • Download URL: strands_diffusers-0.3.0.tar.gz
  • Upload date:
  • Size: 13.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for strands_diffusers-0.3.0.tar.gz
Algorithm Hash digest
SHA256 09eb24b106c57f0295e82ffa5792423edc407ae13ae90fdc27e9f44a5766dc95
MD5 d5a3911d8e89bc4a4b98597de9dae68c
BLAKE2b-256 07e6c96281b1b2dec1b60b7377fd4cfb4db79b7243f73679b876d1a5aa6635b4

See more details on using hashes here.

File details

Details for the file strands_diffusers-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for strands_diffusers-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 019c13d251817aa3d1ced9f975169478e6e21e6e9fc4e46391da9a42112a8a41
MD5 3a50d74f5d443ac2a0685ec8bef7c3ae
BLAKE2b-256 7a946eb7862b0edac3ee8d8f06502b009fff96bd0d91f65979f48a90e5a9ce06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page