Skip to main content

Official Python SDK for the Graphn API: custom-model lifecycle, secrets, and OpenAI-compatible inference.

Project description

graphn — Python SDK

PyPI Python License Tests

The official Python SDK for Graphn. Import any HuggingFace LLM into your workspace, get an OpenAI-compatible inference endpoint, and call it from Python in a handful of lines — without standing up a single GPU yourself.

v0.1.0 scope — This release covers custom-model import and inference end-to-end. A lot of the broader Graphn platform (agents, knowledge bases, workflows, evals, datasets, guardrails, billing, full BYO-inference CRUD, etc.) is not yet exposed through this SDK. Those surfaces will be added in subsequent minor releases as their HTTP APIs stabilize. See Scope below for the exact list.

import graphn

with graphn.Client() as c:
    model = c.custom_models.create(
        name="my-llama",
        huggingface_model_id="Qwen/Qwen3-0.6B",
        weight_source="huggingface",
    )
    c.custom_models.wait_until_ready(model.id)

    resp = c.chat.completions.create(
        model=model.qualified_name,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(resp.choices[0].message.content)

That's it. Cold-start, retry, OpenAI-compatible serialization, and the custom:<id> addressing convention are all handled for you.

Install

pip install graphn

Requires Python 3.10+. Tested on 3.10, 3.11, 3.12, 3.13.

Authentication

The SDK reads credentials from the environment by default:

export GRAPHN_API_KEY=gn_...           # required
export GRAPHN_WORKSPACE_ID=ws_...      # required
export GRAPHN_BASE_URL=https://cp.graphn.ai      # optional
export GRAPHN_INFERENCE_URL=https://model.graphn.ai  # optional

Or pass them explicitly:

client = graphn.Client(api_key="gn_...", workspace_id="ws_...")

Get an API key from the Graphn dashboard.

Scope

What's in the box (v0.1.0)

Module What it does
client.custom_models Import HF models, list, get, refresh, wake, delete, wait_until_ready, validate
client.secrets CRUD for workspace secrets (HuggingFace tokens, etc)
client.chat.completions OpenAI-compatible chat completions, streaming + non-streaming, with auto-wake on cold start
client.models List models served by the gateway
client.tts Text-to-speech: list voices, synthesize
client.imported_models Discover and probe BYO inference endpoints (read-only, no full CRUD yet)

Both graphn.Client and graphn.AsyncClient exist with identical APIs.

What's not in the box yet

The Graphn platform is broader than what's exposed here. The following surfaces exist on the platform but do not have SDK coverage in v0.1.0 — file an issue on the SDK repo to vote on what you need next:

  • Agents — defining, running, and inspecting agent workflows
  • Knowledge bases / RAG — corpus management, retrieval, indexing
  • Workflows — long-running Temporal-backed orchestration
  • Evals & datasets — eval suites, dataset upload, run results
  • Guardrails — policy authoring and inference-time enforcement
  • Imported models (BYO inference) — full CRUD — only listing and probe are exposed today
  • Usage & billing — usage stats, GPU-hour reporting beyond the read-only client.custom_models.gpu_hours() helper
  • Workspace / member / API-key administration

Until they land here, those endpoints can be hit via raw HTTP using your gn_... API key. The control plane is documented at graphn.ai/docs/api and the OpenAPI 3.1 spec is mirrored at voltagepark/graphn-openapi.

The 80% recipe: import a model and chat with it

import graphn

with graphn.Client() as c:
    # 1. Import the model. Use a workspace secret for gated HF repos.
    model = c.custom_models.create(
        name="my-llama",
        huggingface_model_id="meta-llama/Llama-3.1-8B-Instruct",
        weight_source="huggingface",
        hf_token_secret_id="sec_...",  # optional, only for gated models
    )

    # 2. Wait for the deployment to be live.
    c.custom_models.wait_until_ready(model.id, timeout=1800)

    # 3. Chat. The first call will cold-start the model — the SDK
    #    transparently calls wake() and retries until it serves.
    resp = c.chat.completions.create(
        model=model.qualified_name,   # "custom:cm_..."
        messages=[{"role": "user", "content": "Tell me a joke."}],
        wake_timeout=600,             # max time to wait for cold start
    )
    print(resp.choices[0].message.content)

Streaming

stream = c.chat.completions.create(
    model=model.qualified_name,
    messages=[{"role": "user", "content": "Count to ten."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Async

import asyncio
import graphn

async def main() -> None:
    async with graphn.AsyncClient() as c:
        async for m in c.custom_models.list():
            print(m.id, m.name, m.status)

        resp = await c.chat.completions.create(
            model="custom:cm_abc123",
            messages=[{"role": "user", "content": "Hi!"}],
        )
        print(resp.choices[0].message.content)

asyncio.run(main())

Cold starts and auto-wake

Graphn custom models default to scale-to-zero: a model with no traffic for cooldown_seconds is descheduled, and the first request afterwards has to wait for the gateway to spin up a fresh replica (typically 60–600 seconds depending on weight size).

Without help, the first chat request after a cold period returns:

503 Service Unavailable: Model is scaled to zero and is now warming up.

The SDK detects this, calls POST /custom-models/{id}/wake to nudge the autoscaler, and retries with exponential backoff until the model serves or wake_timeout (default 180s) elapses. You don't have to do anything, but the knobs are there if you want them:

# Disable auto-wake — you handle the 503 yourself.
c.chat.completions.create(model=..., messages=[...], auto_wake=False)

# Give the warm-up more headroom (e.g. for large models).
c.chat.completions.create(model=..., messages=[...], wake_timeout=900)

See docs/cold-starts.md for the full story.

Drop-in for openai

The chat path is OpenAI-compatible all the way down — under the hood we delegate to the official openai Python SDK, configured against the Graphn gateway. So tools, structured outputs, multi-modal inputs, function calling, etc. all work out of the box.

If you already have OpenAI-shaped code and just want to point it at a Graphn model:

from openai import OpenAI

client = OpenAI(
    api_key="gn_...",
    base_url="https://model.graphn.ai/v1",
    default_headers={"X-Workspace-Id": "ws_..."},
)
resp = client.chat.completions.create(
    model="custom:cm_...",
    messages=[{"role": "user", "content": "Hello!"}],
)

The reason to use graphn.Client instead is everything around the chat call: lifecycle management, secrets, auto-wake, typed responses, and a stable URL contract.

More examples

See examples/ for runnable end-to-end scripts:

Configuration reference

Argument Default Notes
api_key $GRAPHN_API_KEY Bearer token starting with gn_. Required.
workspace_id $GRAPHN_WORKSPACE_ID Path parameter + X-Workspace-Id header. Required.
base_url https://cp.graphn.ai Control plane host.
inference_url https://model.graphn.ai Inference / OpenAI-compatible host.
timeout 60.0 Per-request HTTPX timeout (seconds).
max_retries 2 Retries on connect failures, 429, and 5xx.
default_headers {} Extra headers added to every request.

Generating clients in other languages

The OpenAPI 3.1 spec is the source of truth. It's published at:

Point your favorite generator at any of these. We test against openapi-generator 6.0+, openapi-python-client 0.21+, and oapi-codegen 2.0+.

Contributing

git clone https://github.com/voltagepark/graphn-sdk-python
cd graphn-sdk-python
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'

ruff check src tests
pytest

Regenerate the typed transport from the upstream spec after a spec change:

./scripts/regenerate.sh

See CHANGELOG.md for release notes.

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphn-0.1.1.tar.gz (50.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphn-0.1.1-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file graphn-0.1.1.tar.gz.

File metadata

  • Download URL: graphn-0.1.1.tar.gz
  • Upload date:
  • Size: 50.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graphn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2cf33273154ba49d1cd56e1c1fae54d4a0e71bc6084c3f36500d2a2b45342d58
MD5 ecc4657c23ad609a5b1129f269c98040
BLAKE2b-256 ad0141e4857bc216d874143b8571b01369a93eceadcf14c8ab851feffaa7de37

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphn-0.1.1.tar.gz:

Publisher: release.yml on voltagepark/graphn-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file graphn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: graphn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graphn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c35f50d3900cc06f1d47342d0f157f3db1b490530188ade0ef03c884f2af77b5
MD5 c416cd1d64d0544d6498f73f24618cb9
BLAKE2b-256 18d8d73c136479e22213cc9078bdbae63cd756d705e453872b84b78fbf36c96b

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphn-0.1.1-py3-none-any.whl:

Publisher: release.yml on voltagepark/graphn-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page