The universal entrypoint to HuggingFace diffusers for Strands agents — 100% pipeline & modality coverage, zero hardcoding. Special focus on Physical-AI world-foundation models (Cosmos) with robot action outputs.
Project description
strands-diffusers
The universal entrypoint to HuggingFace diffusers for Strands agents.
One tool — use_diffusers — wraps the whole library with zero hardcoding:
discover and run any of its 300+ pipelines across every modality. It's a visual
library, so here's what it actually produces — every asset below is real
model output, not a placeholder:
|
text → image any of 108 image pipelines |
text → video LTX · Wan · CogVideoX · Hunyuan |
robot actions 🤖 Cosmos WFM: world video + actions |
text → audio StableAudio · AudioLDM2 |
text / image / video / robot-state IN
image / video / audio / actions / 3d OUT
The registry is built at runtime from diffusers._import_structure, so new
pipelines are supported automatically with no code change. Same philosophy as
use_aws, use_lerobot, and use_transformers: discover, don't hardcode.
|
3D mesh ShapE - verts/faces to .ply |
audio (hear the .wav) StableAudio - waveform to .wav |
100% coverage, zero hardcoding
Every pipeline, model, and scheduler diffusers ships is reachable through one
tool. When diffusers adds a new pipeline, use_diffusers exposes it immediately.
Physical-AI: world-foundation models with action outputs
"Put the pot to the left of the purple item." |
"Pick up the cloth and place it in the bowl." |
"Open the drawer and place the spoon inside." |
Same robot, same first observation — different task prompt → different imagined world and different predicted actions. Five real rollouts + all three Cosmos action modes in the WFM gallery.
This is the headline. A Cosmos action-policy rollout predicts both a future world
video and the robot action chunk that produces it. One
use_diffusers(action="run", ...) returns a .mp4 world video, a .json action
chunk (normalized [-1, 1], shape [num_chunks, T, action_dim]), and optional
.wav sound — and you can see the motion:
| time-series (every dim, gripper highlighted) |
end-effector path (dims 0–2) |
Verified end-to-end on NVIDIA Thor (nvidia/Cosmos3-Nano, bf16/cuda): one call
produced a world video (17, 480, 640, 3) and an action chunk (1, 16, 10). See
examples/cosmos_action_policy.py.
Install
pip install -e .
pip install -e ".[video,audio]" # mp4 export, wav I/O
Quick start
from strands import Agent
from strands_diffusers import use_diffusers
agent = Agent(tools=[use_diffusers])
agent("Generate an image of a robot arm in a kitchen")
agent("Run a Cosmos action-policy rollout on robot.mp4 and give me the actions")
Direct:
use_diffusers(action="run", pipeline="StableDiffusionPipeline",
model="stabilityai/stable-diffusion-2-1",
parameters={"prompt": "a robot arm in a kitchen"})
# -> {"artifacts": ["/tmp/strands_diffusers/image_*.png"]}
Two layers
run loads a pipeline via from_pretrained and calls it; inputs are coerced
(path / URL / base64 to PIL / video), outputs auto-saved and returned by path.
call resolves and calls any diffusers class, function, or method (schedulers,
VAEs, CosmosActionCondition, utils). cached:key references resolve to live
objects; "**" unpacks a cached mapping into kwargs.
use_diffusers(action="call", target="CosmosActionCondition",
parameters={"mode": "policy", "video": "robot.mp4"}, cache_key="cond")
use_diffusers(action="run", pipeline="Cosmos3OmniPipeline", model="nvidia/Cosmos3-Nano",
parameters={"prompt": "...", "action": "cached:cond"},
dtype="bfloat16", device="cuda")
Discovery
| action | returns |
|---|---|
pipelines / models / schedulers |
classes + derived modality |
tasks / modalities / wfm |
task maps / modality groups / world-foundation models |
pipeline_info / inspect |
signature + docs |
visualize |
action chunk to plots + animation |
cache / clear_cache |
manage loaded pipelines |
Architecture
core/registry.py zero-hardcode taxonomy from diffusers._import_structure
core/engine.py load/cache pipelines, auto device+dtype
core/io.py coerce inputs; serialize video/image/audio/action/mesh
core/viz.py render robot action chunks to plots + animation
tools/use_diffusers.py the single @tool: run + call + discovery
Testing
pip install -e ".[video,audio,dev]"
pytest tests/ -q # unit tests, no GPU, no downloads
python examples/smoke.py # E2E gate on tiny fixtures
Every visual in this README and the docs
is produced by real use_diffusers calls — regenerate them with:
python examples/generate_docs_assets.py
Docs
📖 cagataycali.github.io/strands-diffusers — quickstart, full gallery (images / video / audio / actions / 3D), the world-foundation-model story, discovery, and the two-layer design.
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_diffusers-0.3.0.tar.gz.
File metadata
- Download URL: strands_diffusers-0.3.0.tar.gz
- Upload date:
- Size: 13.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09eb24b106c57f0295e82ffa5792423edc407ae13ae90fdc27e9f44a5766dc95
|
|
| MD5 |
d5a3911d8e89bc4a4b98597de9dae68c
|
|
| BLAKE2b-256 |
07e6c96281b1b2dec1b60b7377fd4cfb4db79b7243f73679b876d1a5aa6635b4
|
File details
Details for the file strands_diffusers-0.3.0-py3-none-any.whl.
File metadata
- Download URL: strands_diffusers-0.3.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
019c13d251817aa3d1ced9f975169478e6e21e6e9fc4e46391da9a42112a8a41
|
|
| MD5 |
3a50d74f5d443ac2a0685ec8bef7c3ae
|
|
| BLAKE2b-256 |
7a946eb7862b0edac3ee8d8f06502b009fff96bd0d91f65979f48a90e5a9ce06
|