NVIDIA Cosmos Reason VLM provider for Strands Agents - physical AI reasoning, video understanding, and embodied intelligence

These details have not been verified by PyPI

Project links

Project description

strands-cosmos

Strands Cosmos

NVIDIA Cosmos Reason VLM provider for Strands Agents — physical AI reasoning, video understanding, and embodied intelligence.

Enables Strands agents to use Cosmos-Reason2 models for video captioning, driving analysis, robot planning, temporal reasoning, and physics understanding.

Demo

Dashcam safety analysis with Chain-of-Thought reasoning on Jetson AGX Thor

Click the GIF to watch the full video with audio

📺 Can't see the video? View the interactive terminal recording

The .cast file is at demo/strands-cosmos-demo.cast — play it with asciinema play demo/strands-cosmos-demo.cast

Install

Requirements: Python ≥3.10, NVIDIA GPU (24GB+ for 2B, 32GB+ for 8B)

pip install strands-cosmos strands-agents

NVIDIA Jetson (Thor, Orin, AGX)

On Jetson devices, PyTorch's pip-bundled CUBLAS may be incompatible with the GPU architecture. After installing, run the included fix:

pip install strands-cosmos strands-agents

# Fix CUBLAS (auto-detects if needed, safe to run on any platform)
strands-cosmos-fix-cublas

# Or check without fixing:
strands-cosmos-fix-cublas --check

What does the CUBLAS fix do?

PyTorch wheels ship their own libcublas.so which may not support Jetson GPU architectures (e.g., SM 11.0 on Thor, SM 8.7 on Orin). This causes CUBLAS_STATUS_INVALID_VALUE on any matrix multiplication (torch.mm, attention layers, linear layers, etc.).

The fix:

Backs up torch's bundled libcublas.so and libcublasLt.so
Copies the system CUBLAS from JetPack (/usr/local/cuda/targets/*/lib/)
Verifies the fix with a quick torch.mm test

To revert: strands-cosmos-fix-cublas --revert

Affected: Jetson AGX Thor (SM 11.0), may affect other Jetson devices with pre-release BSPs. Not affected: Desktop GPUs (A100, H100, RTX 4090, etc.), x86_64 systems.

Quick Start

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)

# Video understanding
agent("Caption in detail: <video>dashcam.mp4</video>")

# Image reasoning
agent("<image>robot_view.jpg</image> What can be the next immediate action?")

# Text-only physics reasoning
agent("What happens when a ball rolls off a table?")

Models

Model	GPU Memory	Architecture
Cosmos-Reason2-2B	24GB	Qwen3-VL
Cosmos-Reason2-8B	32GB	Qwen3-VL

Verified Platforms

Platform	GPU	Status
Jetson AGX Thor	NVIDIA Thor 132GB	✅ (with CUBLAS fix)
Desktop	A100 / H100 / RTX 4090	✅
Jetson Orin	Orin 32/64GB	✅ (may need CUBLAS fix)

Features

Video Understanding

from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(
    model_id="nvidia/Cosmos-Reason2-2B",
    fps=4,                    # Video frame rate
    reasoning=True,           # Enable chain-of-thought
    params={"max_tokens": 4096, "temperature": 0.6},
)

Chain-of-Thought Reasoning

model = CosmosVisionModel(reasoning=True)
agent = Agent(model=model)

# Generates <think>reasoning</think> then answer
agent("<video>scene.mp4</video> Is this video physically plausible?")

Built-in Task Prompts

from strands_cosmos.cosmos_vision_model import TASK_PROMPTS

# Available tasks:
# caption, embodied_reasoning, driving, causal,
# temporal_localization, 2d_grounding, robot_cot,
# describe_anything, mvp_bench

As a Tool (in any agent)

from strands import Agent
from strands_cosmos import cosmos_vision_invoke

# Use Cosmos as a tool inside a Bedrock/OpenAI agent
agent = Agent(tools=[cosmos_vision_invoke])
agent("Analyze this dashcam video for safety: /path/to/video.mp4")

Architecture

strands_cosmos/
├── cosmos_model.py          # Text-only CosmosModel (Strands Model interface)
├── cosmos_vision_model.py   # Vision CosmosVisionModel (video + image + text)
├── fix_cublas.py            # Jetson CUBLAS compatibility fix
└── tools/
    ├── cosmos_invoke.py         # Text inference tool
    └── cosmos_vision_invoke.py  # Vision inference tool

graph LR
    A[Strands Agent] -->|CosmosVisionModel| B[Cosmos-Reason2]
    B -->|Video| C[Driving Analysis]
    B -->|Image| D[Robot Planning]
    B -->|Text| E[Physics Reasoning]
    B -->|Reasoning| F[Chain-of-Thought]

    style A fill:#e1f5ff
    style B fill:#76b900,color:#fff
    style F fill:#fff3cd

Configuration

model = CosmosVisionModel(
    model_id="nvidia/Cosmos-Reason2-8B",  # or 2B
    device_map="auto",                     # GPU placement
    torch_dtype="auto",                    # float16/bfloat16
    reasoning=True,                        # CoT reasoning
    fps=4,                                 # Video FPS
    min_vision_tokens=256,                 # Min visual tokens
    max_vision_tokens=8192,                # Max visual tokens
    params={
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.95,
    },
)

Examples

Example	Description
01_basic_text.py	Text-only physics reasoning
02_video_caption.py	Video captioning
03_driving_analysis.py	Dashcam safety analysis with CoT
04_embodied_reasoning.py	Robot next-action prediction
05_tool_usage.py	Cosmos as a tool in another agent

Capabilities

Cosmos-Reason2 excels at physical world understanding:

🚗 Driving Analysis — Traffic, hazards, navigation from dashcam video
🤖 Robot Planning — Next-action prediction, 2D trajectory planning
🎬 Video Captioning — Detailed temporal-spatial descriptions
⚛️ Physics Reasoning — Object permanence, causality, plausibility
🔍 2D Grounding — Bounding box localization in images
📍 Temporal Localization — Event timestamps in video
🧠 Chain-of-Thought — <think> reasoning before answers

Troubleshooting

`CUBLAS_STATUS_INVALID_VALUE` on Jetson

Symptom: Any torch.mm(), attention, or linear layer crashes with CUBLAS error.

Cause: PyTorch's pip-bundled libcublas.so doesn't support Jetson's GPU architecture.

Fix:

strands-cosmos-fix-cublas

This replaces torch's bundled CUBLAS with the system CUBLAS from JetPack. Safe and reversible (--revert).

`StopIteration` in `get_rope_index` during video inference

Symptom: Crash in modeling_qwen3_vl.py when processing video with transformers>=5.3.0.

Cause: Breaking change in transformers 5.3.0 for Qwen3-VL video RoPE position handling.

Fix: Already handled — strands-cosmos pins transformers<5.3.0 in dependencies.

Video decoding warnings (torchcodec / torchvision)

These are harmless warnings about deprecated video decoding. To silence them, install torchcodec:

pip install torchcodec

Resources

License

Apache 2.0 | Built with NVIDIA Cosmos-Reason2 and Strands Agents

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 8, 2026

0.1.2

Mar 22, 2026

This version

0.1.1

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_cosmos-0.1.1.tar.gz (24.5 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

strands_cosmos-0.1.1-py3-none-any.whl (25.5 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file strands_cosmos-0.1.1.tar.gz.

File metadata

Download URL: strands_cosmos-0.1.1.tar.gz
Upload date: Mar 9, 2026
Size: 24.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for strands_cosmos-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`401dcaabf430010a1326f797286079ee874c231c9e1dc97db8907a2be8c92a96`
MD5	`757df354200a951cfd9d2f3af758a869`
BLAKE2b-256	`14feb0ae372387b209c8e20239f8a0143542281204254a1144a504b5a4dd5010`

See more details on using hashes here.

File details

Details for the file strands_cosmos-0.1.1-py3-none-any.whl.

File metadata

Download URL: strands_cosmos-0.1.1-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 25.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for strands_cosmos-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ffbcc32591a14f5fd8ff9186f24c2dd72d7d46c54ef0afb7432aa93aa8ee098c`
MD5	`ec2d011f2e01a22c9958bdade23710a9`
BLAKE2b-256	`15dfdb4a48bbdf7aa433af31d4757a84149eee3fdb87ad006732cc81d191d705`

See more details on using hashes here.

strands-cosmos 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

strands-cosmos

Demo

Install

NVIDIA Jetson (Thor, Orin, AGX)

Quick Start

Models

Verified Platforms

Features

Video Understanding

Chain-of-Thought Reasoning

Built-in Task Prompts

As a Tool (in any agent)

Architecture

Configuration

Examples

Capabilities

Troubleshooting

CUBLAS_STATUS_INVALID_VALUE on Jetson

StopIteration in get_rope_index during video inference

Video decoding warnings (torchcodec / torchvision)

Resources

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`CUBLAS_STATUS_INVALID_VALUE` on Jetson

`StopIteration` in `get_rope_index` during video inference