Skip to main content

NVIDIA Cosmos Reason VLM provider for Strands Agents - physical AI reasoning, video understanding, and embodied intelligence

Project description

strands-cosmos

PyPI version Docs

Strands Cosmos

NVIDIA Cosmos Reason VLM provider for Strands Agents — physical AI reasoning, video understanding, and embodied intelligence.

Enables Strands agents to use Cosmos-Reason2 models for video captioning, driving analysis, robot planning, temporal reasoning, and physics understanding.


Demo

Dashcam safety analysis with Chain-of-Thought reasoning on Jetson AGX Thor

Strands Cosmos Demo — Dashcam safety analysis with Chain-of-Thought reasoning on Jetson AGX Thor

Click the GIF to watch the full video with audio

📺 Can't see the video? View the interactive terminal recording

asciicast

The .cast file is at demo/strands-cosmos-demo.cast — play it with asciinema play demo/strands-cosmos-demo.cast


Install

Requirements: Python ≥3.10, NVIDIA GPU (24GB+ for 2B, 32GB+ for 8B)

pip install strands-cosmos strands-agents

NVIDIA Jetson (Thor, Orin, AGX)

On Jetson devices, PyTorch's pip-bundled CUBLAS may be incompatible with the GPU architecture. After installing, run the included fix:

pip install strands-cosmos strands-agents

# Fix CUBLAS (auto-detects if needed, safe to run on any platform)
strands-cosmos-fix-cublas

# Or check without fixing:
strands-cosmos-fix-cublas --check
What does the CUBLAS fix do?

PyTorch wheels ship their own libcublas.so which may not support Jetson GPU architectures (e.g., SM 11.0 on Thor, SM 8.7 on Orin). This causes CUBLAS_STATUS_INVALID_VALUE on any matrix multiplication (torch.mm, attention layers, linear layers, etc.).

The fix:

  1. Backs up torch's bundled libcublas.so and libcublasLt.so
  2. Copies the system CUBLAS from JetPack (/usr/local/cuda/targets/*/lib/)
  3. Verifies the fix with a quick torch.mm test

To revert: strands-cosmos-fix-cublas --revert

Affected: Jetson AGX Thor (SM 11.0), may affect other Jetson devices with pre-release BSPs. Not affected: Desktop GPUs (A100, H100, RTX 4090, etc.), x86_64 systems.


Quick Start

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)

# Video understanding
agent("Caption in detail: <video>dashcam.mp4</video>")

# Image reasoning
agent("<image>robot_view.jpg</image> What can be the next immediate action?")

# Text-only physics reasoning
agent("What happens when a ball rolls off a table?")

Models

Model GPU Memory Architecture
Cosmos-Reason2-2B 24GB Qwen3-VL
Cosmos-Reason2-8B 32GB Qwen3-VL

Verified Platforms

Platform GPU Status
Jetson AGX Thor NVIDIA Thor 132GB ✅ (with CUBLAS fix)
Desktop A100 / H100 / RTX 4090
Jetson Orin Orin 32/64GB ✅ (may need CUBLAS fix)

Features

Video Understanding

from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(
    model_id="nvidia/Cosmos-Reason2-2B",
    fps=4,                    # Video frame rate
    reasoning=True,           # Enable chain-of-thought
    params={"max_tokens": 4096, "temperature": 0.6},
)

Chain-of-Thought Reasoning

model = CosmosVisionModel(reasoning=True)
agent = Agent(model=model)

# Generates <think>reasoning</think> then answer
agent("<video>scene.mp4</video> Is this video physically plausible?")

Built-in Task Prompts

from strands_cosmos.cosmos_vision_model import TASK_PROMPTS

# Available tasks:
# caption, embodied_reasoning, driving, causal,
# temporal_localization, 2d_grounding, robot_cot,
# describe_anything, mvp_bench

As a Tool (in any agent)

from strands import Agent
from strands_cosmos import cosmos_vision_invoke

# Use Cosmos as a tool inside a Bedrock/OpenAI agent
agent = Agent(tools=[cosmos_vision_invoke])
agent("Analyze this dashcam video for safety: /path/to/video.mp4")

Architecture

strands_cosmos/
├── cosmos_model.py          # Text-only CosmosModel (Strands Model interface)
├── cosmos_vision_model.py   # Vision CosmosVisionModel (video + image + text)
├── fix_cublas.py            # Jetson CUBLAS compatibility fix
└── tools/
    ├── cosmos_invoke.py         # Text inference tool
    └── cosmos_vision_invoke.py  # Vision inference tool
graph LR
    A[Strands Agent] -->|CosmosVisionModel| B[Cosmos-Reason2]
    B -->|Video| C[Driving Analysis]
    B -->|Image| D[Robot Planning]
    B -->|Text| E[Physics Reasoning]
    B -->|Reasoning| F[Chain-of-Thought]

    style A fill:#e1f5ff
    style B fill:#76b900,color:#fff
    style F fill:#fff3cd

Configuration

model = CosmosVisionModel(
    model_id="nvidia/Cosmos-Reason2-8B",  # or 2B
    device_map="auto",                     # GPU placement
    torch_dtype="auto",                    # float16/bfloat16
    reasoning=True,                        # CoT reasoning
    fps=4,                                 # Video FPS
    min_vision_tokens=256,                 # Min visual tokens
    max_vision_tokens=8192,                # Max visual tokens
    params={
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.95,
    },
)

Examples

Example Description
01_basic_text.py Text-only physics reasoning
02_video_caption.py Video captioning
03_driving_analysis.py Dashcam safety analysis with CoT
04_embodied_reasoning.py Robot next-action prediction
05_tool_usage.py Cosmos as a tool in another agent

Capabilities

Cosmos-Reason2 excels at physical world understanding:

  • 🚗 Driving Analysis — Traffic, hazards, navigation from dashcam video
  • 🤖 Robot Planning — Next-action prediction, 2D trajectory planning
  • 🎬 Video Captioning — Detailed temporal-spatial descriptions
  • ⚛️ Physics Reasoning — Object permanence, causality, plausibility
  • 🔍 2D Grounding — Bounding box localization in images
  • 📍 Temporal Localization — Event timestamps in video
  • 🧠 Chain-of-Thought<think> reasoning before answers

Troubleshooting

CUBLAS_STATUS_INVALID_VALUE on Jetson

Symptom: Any torch.mm(), attention, or linear layer crashes with CUBLAS error.

Cause: PyTorch's pip-bundled libcublas.so doesn't support Jetson's GPU architecture.

Fix:

strands-cosmos-fix-cublas

This replaces torch's bundled CUBLAS with the system CUBLAS from JetPack. Safe and reversible (--revert).

StopIteration in get_rope_index during video inference

Symptom: Crash in modeling_qwen3_vl.py when processing video with transformers>=5.3.0.

Cause: Breaking change in transformers 5.3.0 for Qwen3-VL video RoPE position handling.

Fix: Already handled — strands-cosmos pins transformers<5.3.0 in dependencies.

Video decoding warnings (torchcodec / torchvision)

These are harmless warnings about deprecated video decoding. To silence them, install torchcodec:

pip install torchcodec

Resources


License

Apache 2.0 | Built with NVIDIA Cosmos-Reason2 and Strands Agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_cosmos-0.1.1.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_cosmos-0.1.1-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file strands_cosmos-0.1.1.tar.gz.

File metadata

  • Download URL: strands_cosmos-0.1.1.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for strands_cosmos-0.1.1.tar.gz
Algorithm Hash digest
SHA256 401dcaabf430010a1326f797286079ee874c231c9e1dc97db8907a2be8c92a96
MD5 757df354200a951cfd9d2f3af758a869
BLAKE2b-256 14feb0ae372387b209c8e20239f8a0143542281204254a1144a504b5a4dd5010

See more details on using hashes here.

File details

Details for the file strands_cosmos-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: strands_cosmos-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for strands_cosmos-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ffbcc32591a14f5fd8ff9186f24c2dd72d7d46c54ef0afb7432aa93aa8ee098c
MD5 ec2d011f2e01a22c9958bdade23710a9
BLAKE2b-256 15dfdb4a48bbdf7aa433af31d4757a84149eee3fdb87ad006732cc81d191d705

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page