Skip to main content

NVIDIA Cosmos Reason VLM provider for Strands Agents - physical AI reasoning, video understanding, and embodied intelligence

Project description

strands-cosmos

PyPI version Docs

Awesome Strands Agents

Strands Cosmos

NVIDIA Cosmos toolkit for Strands Agents — from VLM reasoning to world-model generation, edge deployment, and evaluation.

Provides Cosmos-Reason2 as a Strands model provider plus 21 tools covering the entire NVIDIA Cosmos ecosystem: inference, video generation (Predict2.5), video-to-video (Transfer2.5), data curation (Xenna), post-training, distillation, quantization, edge deployment, and evaluation.


Demo

Dashcam safety analysis with Chain-of-Thought reasoning on Jetson AGX Thor

Strands Cosmos Demo

Install

pip install strands-cosmos

Developer Setup

git clone https://github.com/cagataycali/strands-cosmos && cd strands-cosmos
just setup-full    # Installs system deps, Python deps, clones all Cosmos repos
just doctor        # Verify everything

NVIDIA Jetson (Thor, Orin, AGX)

pip install strands-cosmos
strands-cosmos-fix-cublas   # Fix CUBLAS for Jetson GPU architecture

Quick Start

from strands import Agent
from strands_cosmos import CosmosVisionModel

model = CosmosVisionModel(model_id="nvidia/Cosmos-Reason2-2B")
agent = Agent(model=model)

# Video understanding
agent("Caption in detail: <video>dashcam.mp4</video>")

# Image reasoning
agent("<image>robot_view.jpg</image> What should the robot do next?")

# Text-only physics reasoning
agent("What happens when a ball rolls off a table?")

Tools

Use any tool inside a Strands Agent for full Cosmos pipeline automation:

Category Tools Description
Reason2 VLM cosmos_inference, cosmos_reason_hf, cosmos_serve TRT server inference, HF direct inference, server lifecycle
Predict 2.5 cosmos_predict_generate World-model video generation (future frame prediction)
Transfer 2.5 cosmos_transfer_generate ControlNet video-to-video (depth/edge/sketch→video)
Model Lifecycle cosmos_model_download, cosmos_quantize, cosmos_export_onnx, cosmos_build_engine Download, FP8 quantize, ONNX export, TRT engine build
Training cosmos_post_train, cosmos_distill SFT/LoRA post-training, knowledge distillation
Data cosmos_curate Xenna data curation pipeline
Evaluation cosmos_evaluate FID/FVD/CSE/CLIP benchmark evaluation
I/O rtp_capture_frame, nats_publish, video_probe, video_extract_frames, image_read RTP capture, NATS messaging, video/image utilities
System cosmos_sysinfo GPU/platform diagnostics
from strands import Agent
from strands_cosmos import cosmos_reason_hf, video_probe, cosmos_sysinfo

agent = Agent(tools=[cosmos_reason_hf, video_probe, cosmos_sysinfo])
agent("Check the system, then analyze the video at /tmp/scene.mp4")

Models

Model GPU Memory Use Case
Cosmos-Reason2-2B 24GB Edge deployment (Jetson Thor/Orin)
Cosmos-Reason2-8B 32GB Cloud/desktop high-accuracy

Performance (Jetson AGX Thor, Reason2-2B)

Task Load Time Generation
Text inference 7s 1.4s (46 tokens)
Video caption 7s 2.2s (short clip @ 4fps)

Architecture

strands_cosmos/
├── cosmos_model.py              # CosmosModel (text-only Strands Model)
├── cosmos_vision_model.py       # CosmosVisionModel (video+image+text)
├── fix_cublas.py                # Jetson CUBLAS compatibility fix
├── tools/                       # 21 tools (full Cosmos pipeline)
│   ├── inference.py             # TRT server inference
│   ├── reason_hf.py            # HF Transformers direct inference
│   ├── serve.py                # Server lifecycle management
│   ├── predict_generate.py     # Predict2.5 world model
│   ├── transfer_generate.py    # Transfer2.5 ControlNet
│   ├── model_download.py       # HF model download
│   ├── quantize.py             # FP8 quantization
│   ├── export_onnx.py          # ONNX export
│   ├── build_engine.py         # TRT engine build
│   ├── post_train.py           # Post-training (SFT/LoRA)
│   ├── distill.py              # Knowledge distillation
│   ├── curate.py               # Xenna data curation
│   ├── evaluate.py             # Benchmark evaluation
│   ├── rtp.py                  # GStreamer RTP capture
│   ├── nats_pub.py             # NATS publish
│   ├── video_utils.py          # ffprobe + frame extraction
│   ├── image_read.py           # Base64 image read
│   └── sysinfo.py              # System diagnostics
└── justfile                     # Developer workflow automation

Justfile (Developer Workflow)

just setup          # Clone all 6 Cosmos ecosystem repos
just setup-full     # Full setup: system deps + Python + repos + diagnostics
just doctor         # Check repos, tools, GPU, platform compatibility
just install-trt-edge-llm  # Build TensorRT-Edge-LLM from source (Jetson)

# Run pipelines
just predict-generate config.json
just transfer-generate config.json
just evaluate metrics.json
just serve-start

Configuration

model = CosmosVisionModel(
    model_id="nvidia/Cosmos-Reason2-8B",
    device_map="auto",
    torch_dtype="auto",
    reasoning=True,           # Chain-of-thought <think>...</think>
    fps=4,                    # Video sampling rate
    min_vision_tokens=256,
    max_vision_tokens=8192,
    params={"max_tokens": 4096, "temperature": 0.6},
)

Verified Platforms

Platform GPU Status
Jetson AGX Thor NVIDIA Thor 132GB ✅ (with CUBLAS fix)
Jetson Orin 32/64GB ✅ (may need CUBLAS fix)
Desktop A100 / H100 / RTX 4090
Cloud Any CUDA 12+ GPU

Troubleshooting

CUBLAS_STATUS_INVALID_VALUE on Jetson

strands-cosmos-fix-cublas    # Replaces torch's bundled CUBLAS with JetPack system CUBLAS

StopIteration in get_rope_index during video

Already handled — strands-cosmos pins transformers<5.3.0. If you see this, run:

pip install "transformers>=4.57.0,<5.3.0"

TRT tools return exit 127

Expected on workstations — those tools run on Jetson or in TRT Docker. Run just doctor to see what works on your machine.


Resources


License

Apache 2.0 | Built with NVIDIA Cosmos and Strands Agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strands_cosmos-0.2.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strands_cosmos-0.2.0-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file strands_cosmos-0.2.0.tar.gz.

File metadata

  • Download URL: strands_cosmos-0.2.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for strands_cosmos-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a08b42c4e12bc24fa2829f03b804fe971b924609fcc88f92276a5f86fbd132af
MD5 30230093d0ae66819f417cc1d463e115
BLAKE2b-256 c0dbc9d4652c2f37b5d407c0a4aef0ce05e3ec9e6d5649dc8dea2ee58a843098

See more details on using hashes here.

File details

Details for the file strands_cosmos-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: strands_cosmos-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for strands_cosmos-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87b5d7d315323636adb7f0ac217151d443d949fdaf3202cf135676d4286796b4
MD5 7fee980fccc45fa754ff55b054115ab9
BLAKE2b-256 575c6d390da1fc460f045b0a844f51f416b18f998d20517fc815b10d8ada96c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page