Skip to main content

NVIDIA NIM / build.nvidia.com media provider adapters for genblaze (video, image, audio, chat)

Project description

genblaze-nvidia

NVIDIA NIM / build.nvidia.com provider adapters for genblaze. Covers four modalities on one nvapi- key: video (Cosmos, Edify), image (SDXL, SD 3.5, FLUX), audio (Fugatto, Riva TTS), and chat (Nemotron, Llama, Mistral, Qwen, …).

Install

pip install genblaze-nvidia            # video/image/audio providers
pip install "genblaze-nvidia[chat]"    # + the OpenAI SDK for LLM calls

Auth

Sign up at build.nvidia.com and create an API key (starts with nvapi-). Export it:

export NVIDIA_API_KEY=nvapi-...

The free tier is rate-limited (~40 requests/minute per model) with no per-token billing. Some models (Cosmos video) are still enterprise-gated as of 2026-04 and will return AUTH_FAILURE for free-tier keys until you have access.

Two base URLs

NVIDIA's API spans two public hosts on the same key:

Surface Base URL Used by
OpenAI-compatible chat / embeddings https://integrate.api.nvidia.com/v1 chat, achat
Model-specific generation https://ai.api.nvidia.com/v1/genai/{vendor}/{slug} NvidiaVideoProvider, NvidiaImageProvider, NvidiaAudioProvider
NVCF async status https://api.nvcf.nvidia.com/v2/nvcf/pexec/status Async video polling

All three are overridable per-constructor for self-hosted NIM deployments:

NvidiaImageProvider(
    api_key="...",
    gen_base_url="https://self-hosted.internal/v1",
    nvcf_status_url="https://self-hosted.internal/v2/nvcf/pexec/status",
)

Video — NvidiaVideoProvider

Cosmos and Edify Video return async (202 Accepted + NVCF-REQID header) and the provider polls NVCF for completion. Some fast models return inline synchronous responses — both paths converge on the same lifecycle.

from genblaze_core.models.step import Step
from genblaze_nvidia import NvidiaVideoProvider

provider = NvidiaVideoProvider()  # reads NVIDIA_API_KEY
step = Step(
    provider="nvidia-video",
    model="nvidia/cosmos-1.0-7b-diffusion-text2world",
    prompt="a drone flight over a coastal cliff at sunset",
)
result = provider.invoke(step)
print(result.assets[0].url)  # file:// or https:// depending on response shape

Image — NvidiaImageProvider

Synchronous inline base64 response. If an endpoint occasionally returns 202, the provider short-polls NVCF inside generate() so the caller still sees one blocking call.

from genblaze_nvidia import NvidiaImageProvider

provider = NvidiaImageProvider()
step = Step(
    provider="nvidia-image",
    model="stabilityai/stable-diffusion-3-5-large",
    prompt="a studio photo of a brass teapot",
    params={"cfg_scale": 4.5, "aspect_ratio": "1:1"},
)
result = provider.invoke(step)

SDXL's schema differs from SD 3.5 / FLUX — the registry handles that transparently, rewriting prompt + negative_prompt into the text_prompts array SDXL expects.

Audio — NvidiaAudioProvider

from genblaze_nvidia import NvidiaAudioProvider

provider = NvidiaAudioProvider()

# TTS (mono)
step = Step(provider="nvidia-audio", model="nvidia/riva-tts", prompt="Hello, world.")

# Music / SFX (stereo)
step = Step(provider="nvidia-audio", model="nvidia/fugatto", prompt="upbeat synthwave intro")

result = provider.invoke(step)

Chat — chat / achat

OpenAI-wire-compatible. Any model NIM currently serves works as a plain string — no enumeration.

from genblaze_nvidia import chat

resp = chat(
    "nvidia/nemotron-4-340b-instruct",
    prompt="Summarize the Cosmos world foundation model in one sentence.",
)
print(resp.text)
import asyncio
from genblaze_nvidia import achat

async def main():
    r = await achat("meta/llama-3.3-70b-instruct", prompt="hi")
    print(r.text)

asyncio.run(main())

Structured outputs

Pass a Pydantic class to response_format= and the JSON Schema is generated automatically (NIM, OpenAI, GMICloud all speak the same json_schema envelope).

from pydantic import BaseModel
from genblaze_nvidia import chat

class Summary(BaseModel):
    title: str
    key_points: list[str]

resp = chat(
    "nvidia/nemotron-3-nano-omni-30b-a3b-reasoning",
    prompt="Summarize: NVIDIA shipped a 30B-A3B omnimodal model.",
    response_format=Summary,
)
import json; obj = Summary.model_validate(json.loads(resp.text))

Chat as a Pipeline step — NvidiaChatProvider

Drop NIM chat into a Pipeline alongside generation steps. Multimodal input flows through step.inputs[Asset] — each asset becomes the right OpenAI-vision content block based on its media_type. Verified against the Nemotron 3 Nano Omni model card on build.nvidia.com.

from genblaze_core import Asset, Pipeline
from genblaze_nvidia import NvidiaChatProvider

# Eyes-and-ears: image + audio + video in one step.
inputs = [
    Asset(url="https://example.com/scene.png",  media_type="image/png"),
    Asset(url="https://example.com/voice.wav",  media_type="audio/wav"),
    Asset(url="https://example.com/clip.mp4",   media_type="video/mp4"),
]

pipe = Pipeline().step(
    NvidiaChatProvider(reasoning=False),  # turn off thinking for a fast perception pass
    model="nvidia/nemotron-3-nano-omni-30b-a3b-reasoning",
    prompt="Describe what's happening across these inputs.",
    external_inputs=inputs,  # caller-held Assets seeded directly into step.inputs
)
result = pipe.run()
print(result.steps[-1].assets[0].metadata["text"])

reasoning is tri-state: None (default) lets the server pick based on the model checkpoint, True/False overrides explicitly via extra_body["chat_template_kwargs"]["enable_thinking"]. Tuning fields like media_io_kwargs={"video": {"fps": 3.0}} and mm_processor_kwargs={"max_num_tiles": 3} are passed through to NIM untouched.

PDFs are not natively supported — Nemotron Omni processes documents as multi-page image sequences upstream, so callers must rasterize pages client-side and pass each one as Asset(media_type="image/png").

Models

genblaze-nvidia ships pattern-keyed ModelFamily rules — each family encodes the per-line param shape (SDXL's text_prompts, Cosmos's width/height/fps, etc.), and any slug fitting the pattern works the day NIM ships it. The audio / video / image endpoints declare DiscoverySupport.PARTIAL — slug liveness is confirmed via the empty-payload-POST probe attached to each family, so a retired slug like the historical nvidia/riva-tts surfaces as NOT_FOUND at preflight rather than mid-pipeline 404. Chat declares DiscoverySupport.NATIVE and reads the integrate.api.nvidia.com/v1/models catalog directly.

Modality Family pattern(s) Example slugs
Video ^nvidia/cosmos- nvidia/cosmos-1.0-7b-diffusion-text2world, .../video2world, nvidia/cosmos-2.0-diffusion-*
Image ^stabilityai/stable-diffusion, ^black-forest-labs/flux stable-diffusion-xl, stable-diffusion-3-5-{large,large-turbo,medium}, flux.1-{schnell,dev}
Audio ^nvidia/fugatto, ^nvidia/(?:magpie-tts|riva-tts|maxine-) nvidia/fugatto, nvidia/magpie-tts-multilingual, nvidia/maxine-voice-font
Chat n/a — NATIVE discovery Any NIM chat model id

Pricing is not shipped — register a strategy from docs/reference/pricing-recipes.md when one is published for the model line you use. Until then, step.cost_usd is None.

Discover live models at runtime (if you want the fresh catalog) via the OpenAI-compatible /v1/models endpoint:

import httpx, os
r = httpx.get(
    "https://integrate.api.nvidia.com/v1/models",
    headers={"Authorization": f"Bearer {os.environ['NVIDIA_API_KEY']}"},
)
for m in r.json()["data"]:
    print(m["id"])

Error handling

NIM returns safety refusals as HTTP 400 with Nemoguard / safety markers in the body. map_nvidia_error classifies these as CONTENT_POLICY (non-retryable) instead of INVALID_INPUT — pipelines don't burn retries on a deterministic refusal.

HTTP / message ProviderErrorCode
401, 403 AUTH_FAILURE
404 MODEL_ERROR
429 RATE_LIMIT
400 with safety marker CONTENT_POLICY
400 plain INVALID_INPUT
5xx SERVER_ERROR
transport timeout TIMEOUT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genblaze_nvidia-0.3.0.tar.gz (39.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genblaze_nvidia-0.3.0-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file genblaze_nvidia-0.3.0.tar.gz.

File metadata

  • Download URL: genblaze_nvidia-0.3.0.tar.gz
  • Upload date:
  • Size: 39.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genblaze_nvidia-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7f60a08573a58aaff1dc119d777f12317c33d878042c7000eeb95cce7150f461
MD5 e6670c3cce5810263993ff30fd45307e
BLAKE2b-256 aa1a29faf75de625e606d369d869ac03aeb5fb87fb394060ae5d9ed98cb31791

See more details on using hashes here.

Provenance

The following attestation bundles were made for genblaze_nvidia-0.3.0.tar.gz:

Publisher: release.yml on backblaze-labs/genblaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genblaze_nvidia-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: genblaze_nvidia-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genblaze_nvidia-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98c59abc794ff2a24e289a6a296297b1ea825b20f62a4844a7bdf688fb80fd15
MD5 636878aa9dd90950103ee18ca0e9209a
BLAKE2b-256 5af9cabf2322d2c0d87708bffebd582947e138d8489f00179ca72409743a79e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for genblaze_nvidia-0.3.0-py3-none-any.whl:

Publisher: release.yml on backblaze-labs/genblaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page