The universal entrypoint to HuggingFace diffusers for Strands agents — 100% pipeline & modality coverage, zero hardcoding. Special focus on Physical-AI world-foundation models (Cosmos) with robot action outputs.
Project description
🎨 strands-diffusers
The universal entrypoint to HuggingFace diffusers for Strands agents — 100%
pipeline & modality coverage, zero hardcoding.
Just like use_aws wraps boto3,
use_lerobot wraps lerobot, and
use_transformers wraps the
transformers task taxonomy, use_diffusers wraps the entire diffusers
library behind a single tool. Discover, don't hardcode: the registry is built at
runtime from diffusers._import_structure, so when diffusers ships a new pipeline
(say, a fresh Cosmos world-foundation model), strands-diffusers supports it
automatically — no code change required.
text / image / video / robot-state IN
image / video / audio / ACTIONS OUT — natively.
🌍 Physical-AI focus: world-foundation models with action outputs
The headline use-case is NVIDIA Cosmos and other world-foundation models
(WFMs). A Cosmos 3 action-policy rollout doesn't just generate a plausible
future video — it predicts the robot action chunk that produces it. A single
use_diffusers(action="run", ...) call returns BOTH:
- a playable world video (
.mp4) - the predicted action chunk in model-normalized action space (
.json, shape[num_chunks, T, action_dim]) - (optionally) synchronized sound (
.wav)
— all surfaced as artifact paths, ready to hand to a robot controller or the user.
Verified end-to-end on NVIDIA Thor (diffusers
0.39.0.dev0,nvidia/Cosmos3-Nano, bf16/cuda): oneuse_diffusers(action="run", pipeline="Cosmos3OmniPipeline", ...)call produced a world video(17, 480, 640, 3)and a robot action chunk(1, 16, 10)=(num_chunks, T, action_dim), normalized to[-1, 1]. Seeexamples/cosmos_action_policy.pyandexamples/SETUP_COSMOS.md.
Install
pip install -e .
# optional extras:
pip install -e ".[video,audio]" # mp4 export, wav I/O
Quick start
from strands import Agent
from strands_diffusers import use_diffusers
agent = Agent(tools=[use_diffusers])
agent("Generate an image of a robot arm in a kitchen")
agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")
Or drive it directly:
from strands_diffusers import use_diffusers
# text → image
use_diffusers(
action="run",
pipeline="StableDiffusionPipeline",
model="stabilityai/stable-diffusion-2-1",
parameters={"prompt": "a robot arm in a kitchen", "num_inference_steps": 25},
)
Two layers
1. run — high-level pipeline runner
Loads a pipeline class via from_pretrained and calls it. Inputs are coerced
(paths / URLs / base64 → PIL / video); outputs (image / video / audio / action)
are auto-saved and returned by path.
use_diffusers(action="run", pipeline="WanPipeline", model="...",
parameters={"prompt": "...", "num_frames": 81}, fps=16)
2. call — low-level dynamic dispatch
Resolve & call any diffusers class / function / method — schedulers, VAEs,
CosmosActionCondition, utils.export_to_video, or a cached pipeline's method.
cached:key references resolve to live objects; the "**" key unpacks a cached
mapping into kwargs (the pipe(**inputs) pattern).
# Build a Cosmos action condition, cache it, then run an action-policy rollout.
use_diffusers(action="call", target="CosmosActionCondition",
parameters={"mode": "policy", "chunk_size": 16,
"domain_name": "bridge_orig_lerobot",
"resolution_tier": 480, "video": "robot.mp4",
"view_point": "ego_view"},
cache_key="act_cond")
use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
parameters={"prompt": "Put the pot to the left of the purple item.",
"action": "cached:act_cond", "fps": 5,
"num_inference_steps": 30, "guidance_scale": 1.0,
"use_system_prompt": False},
dtype="bfloat16", device="cuda")
# → artifacts: cosmos_world.mp4 + action chunk .json ([1, 16, action_dim])
Discovery (the agent never guesses)
| action | what it returns |
|---|---|
pipelines |
all 300+ pipeline classes + derived modality |
models |
every model class (VAEs, transformers, controlnets) |
schedulers |
every scheduler class |
tasks |
diffusers' AutoPipeline task → {family: class} maps |
modalities |
pipelines grouped by modality (image / video / world / audio / 3d mesh) |
wfm |
world-foundation / action-capable pipelines (Cosmos, Wan, Hunyuan) |
pipeline_info |
modality + __call__ signature for one pipeline class |
inspect |
signature + docstring of any target |
visualize |
render a robot ACTION chunk → time-series + 3D trajectory + animation (mp4/gif) |
cache / clear_cache |
manage loaded pipelines (free GPU memory) |
Architecture
strands_diffusers/
├── core/
│ ├── registry.py # zero-hardcode taxonomy from diffusers._import_structure
│ ├── engine.py # load/cache pipelines, auto device+dtype
│ └── io.py # coerce inputs; serialize video/image/audio/ACTION outputs
└── tools/
└── use_diffusers.py # the single @tool: run + call + discovery
Testing
pip install -e ".[video,audio,dev]"
pytest tests/ -q # 26 unit tests — no GPU, no model downloads
python examples/smoke.py # E2E gate on tiny HF fixtures
tests/ covers the registry classifier (golden modalities + a guard that no
video/WFM pipeline is ever mislabeled as a still image), and the multimodal I/O
serializers — image, video (incl. list[ndarray]), stereo audio (channels-
first and channels-last), the robot action chunk, and 3D mesh output
(ShapE → .ply/.obj/.npz). CI runs both on py3.10 + py3.12.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_diffusers-0.1.0.tar.gz.
File metadata
- Download URL: strands_diffusers-0.1.0.tar.gz
- Upload date:
- Size: 42.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
247853d27b92596a223abc1deed67077d9c5f2b6298fb17759674dc4f113c53d
|
|
| MD5 |
31ac92e6ea6a169d82b8c35ff4a5ffea
|
|
| BLAKE2b-256 |
6e0bde0a919bccdf3822ec5ba1a4a2dac4976bc3b1b5a6eab4c1fe3b64feb8fd
|
File details
Details for the file strands_diffusers-0.1.0-py3-none-any.whl.
File metadata
- Download URL: strands_diffusers-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdf8eb8bd0cf3a2057cd7b710f65126504862fc55d5433f40a5898c26a771eb7
|
|
| MD5 |
422c39bb3cd976808fe54faef74808e3
|
|
| BLAKE2b-256 |
4f402555d77833ec4b04292dbdf0eea0a0fdba5bb56dad5a8feb4ab165bc61bd
|