NVIDIA Cosmos Reason VLM provider for Strands Agents - physical AI reasoning, video understanding, and embodied intelligence
Project description
strands-cosmos
NVIDIA Cosmos Reason VLM provider for Strands Agents — physical AI reasoning, video understanding, and embodied intelligence.
Enables Strands agents to use Cosmos-Reason2 models for video captioning, driving analysis, robot planning, temporal reasoning, and physics understanding.
Demo
Dashcam safety analysis with Chain-of-Thought reasoning on Jetson AGX Thor
Click the GIF to watch the full video with audio
📺 Can't see the video? View the interactive terminal recording
The .cast file is at demo/strands-cosmos-demo.cast — play it with asciinema play demo/strands-cosmos-demo.cast
Install
Requirements: Python ≥3.10, NVIDIA GPU (24GB+ for 2B, 32GB+ for 8B)
pip install strands-cosmos strands-agents
NVIDIA Jetson (Thor, Orin, AGX)
On Jetson devices, PyTorch's pip-bundled CUBLAS may be incompatible with the GPU architecture. After installing, run the included fix:
pip install strands-cosmos strands-agents
# Fix CUBLAS (auto-detects if needed, safe to run on any platform)
strands-cosmos-fix-cublas
# Or check without fixing:
strands-cosmos-fix-cublas --check
What does the CUBLAS fix do?
PyTorch wheels ship their own libcublas.so which may not support Jetson GPU architectures (e.g., SM 11.0 on Thor, SM 8.7 on Orin). This causes CUBLAS_STATUS_INVALID_VALUE on any matrix multiplication (torch.mm, attention layers, linear layers, etc.).
The fix:
- Backs up torch's bundled
libcublas.soandlibcublasLt.so - Copies the system CUBLAS from JetPack (
/usr/local/cuda/targets/*/lib/) - Verifies the fix with a quick
torch.mmtest
To revert: strands-cosmos-fix-cublas --revert
Affected: Jetson AGX Thor (SM 11.0), may affect other Jetson devices with pre-release BSPs. Not affected: Desktop GPUs (A100, H100, RTX 4090, etc.), x86_64 systems.
Quick Start
from strands import Agent
from strands_cosmos import CosmosVisionModel
model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)
# Video understanding
agent("Caption in detail: <video>dashcam.mp4</video>")
# Image reasoning
agent("<image>robot_view.jpg</image> What can be the next immediate action?")
# Text-only physics reasoning
agent("What happens when a ball rolls off a table?")
Models
| Model | GPU Memory | Architecture |
|---|---|---|
| Cosmos-Reason2-2B | 24GB | Qwen3-VL |
| Cosmos-Reason2-8B | 32GB | Qwen3-VL |
Verified Platforms
| Platform | GPU | Status |
|---|---|---|
| Jetson AGX Thor | NVIDIA Thor 132GB | ✅ (with CUBLAS fix) |
| Desktop | A100 / H100 / RTX 4090 | ✅ |
| Jetson Orin | Orin 32/64GB | ✅ (may need CUBLAS fix) |
Features
Video Understanding
from strands_cosmos import CosmosVisionModel
model = CosmosVisionModel(
model_id="nvidia/Cosmos-Reason2-2B",
fps=4, # Video frame rate
reasoning=True, # Enable chain-of-thought
params={"max_tokens": 4096, "temperature": 0.6},
)
Chain-of-Thought Reasoning
model = CosmosVisionModel(reasoning=True)
agent = Agent(model=model)
# Generates <think>reasoning</think> then answer
agent("<video>scene.mp4</video> Is this video physically plausible?")
Built-in Task Prompts
from strands_cosmos.cosmos_vision_model import TASK_PROMPTS
# Available tasks:
# caption, embodied_reasoning, driving, causal,
# temporal_localization, 2d_grounding, robot_cot,
# describe_anything, mvp_bench
As a Tool (in any agent)
from strands import Agent
from strands_cosmos import cosmos_vision_invoke
# Use Cosmos as a tool inside a Bedrock/OpenAI agent
agent = Agent(tools=[cosmos_vision_invoke])
agent("Analyze this dashcam video for safety: /path/to/video.mp4")
Architecture
strands_cosmos/
├── cosmos_model.py # Text-only CosmosModel (Strands Model interface)
├── cosmos_vision_model.py # Vision CosmosVisionModel (video + image + text)
├── fix_cublas.py # Jetson CUBLAS compatibility fix
└── tools/
├── cosmos_invoke.py # Text inference tool
└── cosmos_vision_invoke.py # Vision inference tool
graph LR
A[Strands Agent] -->|CosmosVisionModel| B[Cosmos-Reason2]
B -->|Video| C[Driving Analysis]
B -->|Image| D[Robot Planning]
B -->|Text| E[Physics Reasoning]
B -->|Reasoning| F[Chain-of-Thought]
style A fill:#e1f5ff
style B fill:#76b900,color:#fff
style F fill:#fff3cd
Configuration
model = CosmosVisionModel(
model_id="nvidia/Cosmos-Reason2-8B", # or 2B
device_map="auto", # GPU placement
torch_dtype="auto", # float16/bfloat16
reasoning=True, # CoT reasoning
fps=4, # Video FPS
min_vision_tokens=256, # Min visual tokens
max_vision_tokens=8192, # Max visual tokens
params={
"max_tokens": 4096,
"temperature": 0.6,
"top_p": 0.95,
},
)
Examples
| Example | Description |
|---|---|
| 01_basic_text.py | Text-only physics reasoning |
| 02_video_caption.py | Video captioning |
| 03_driving_analysis.py | Dashcam safety analysis with CoT |
| 04_embodied_reasoning.py | Robot next-action prediction |
| 05_tool_usage.py | Cosmos as a tool in another agent |
Capabilities
Cosmos-Reason2 excels at physical world understanding:
- 🚗 Driving Analysis — Traffic, hazards, navigation from dashcam video
- 🤖 Robot Planning — Next-action prediction, 2D trajectory planning
- 🎬 Video Captioning — Detailed temporal-spatial descriptions
- ⚛️ Physics Reasoning — Object permanence, causality, plausibility
- 🔍 2D Grounding — Bounding box localization in images
- 📍 Temporal Localization — Event timestamps in video
- 🧠 Chain-of-Thought —
<think>reasoning before answers
Troubleshooting
CUBLAS_STATUS_INVALID_VALUE on Jetson
Symptom: Any torch.mm(), attention, or linear layer crashes with CUBLAS error.
Cause: PyTorch's pip-bundled libcublas.so doesn't support Jetson's GPU architecture.
Fix:
strands-cosmos-fix-cublas
This replaces torch's bundled CUBLAS with the system CUBLAS from JetPack. Safe and reversible (--revert).
StopIteration in get_rope_index during video inference
Symptom: Crash in modeling_qwen3_vl.py when processing video with transformers>=5.3.0.
Cause: Breaking change in transformers 5.3.0 for Qwen3-VL video RoPE position handling.
Fix: Already handled — strands-cosmos pins transformers<5.3.0 in dependencies.
Video decoding warnings (torchcodec / torchvision)
These are harmless warnings about deprecated video decoding. To silence them, install torchcodec:
pip install torchcodec
Resources
- Cosmos-Reason2 GitHub
- HuggingFace Models
- Strands Agents
- strands-mlx (Apple Silicon provider)
License
Apache 2.0 | Built with NVIDIA Cosmos-Reason2 and Strands Agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file strands_cosmos-0.1.1.tar.gz.
File metadata
- Download URL: strands_cosmos-0.1.1.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
401dcaabf430010a1326f797286079ee874c231c9e1dc97db8907a2be8c92a96
|
|
| MD5 |
757df354200a951cfd9d2f3af758a869
|
|
| BLAKE2b-256 |
14feb0ae372387b209c8e20239f8a0143542281204254a1144a504b5a4dd5010
|
File details
Details for the file strands_cosmos-0.1.1-py3-none-any.whl.
File metadata
- Download URL: strands_cosmos-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffbcc32591a14f5fd8ff9186f24c2dd72d7d46c54ef0afb7432aa93aa8ee098c
|
|
| MD5 |
ec2d011f2e01a22c9958bdade23710a9
|
|
| BLAKE2b-256 |
15dfdb4a48bbdf7aa433af31d4757a84149eee3fdb87ad006732cc81d191d705
|