Skip to main content

Official Python SDK for the Graphn API: custom-model lifecycle, secrets, and OpenAI-compatible inference.

Project description

graphn — Python SDK

PyPI Python License Tests

The official Python SDK for Graphn. Import any LLM — from HuggingFace or your own S3 bucket — into your workspace, get an OpenAI-compatible inference endpoint, and call it from Python in a handful of lines, without standing up a single GPU yourself.

v0.1.x scope — This release line covers custom-model import (HuggingFace + S3) and OpenAI-compatible inference end-to-end. A lot of the broader Graphn platform (agents, knowledge bases, workflows, evals, datasets, guardrails, billing, full BYO-inference CRUD, etc.) is not yet exposed through this SDK. Those surfaces will be added in subsequent minor releases as their HTTP APIs stabilize. See Scope below for the exact list.

import graphn

with graphn.Client() as c:
    model = c.custom_models.create(
        name="my-llama",
        huggingface_model_id="Qwen/Qwen3-0.6B",
        weight_source="huggingface",
    )
    c.custom_models.wait_until_ready(model.id)

    resp = c.chat.completions.create(
        model=model.qualified_name,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(resp.choices[0].message.content)

That's it. Cold-start, retry, OpenAI-compatible serialization, and the custom:<id> addressing convention are all handled for you.

Install

pip install graphn

Requires Python 3.10+. Tested on 3.10, 3.11, 3.12, 3.13.

Authentication

The SDK reads credentials from the environment by default:

export GRAPHN_API_KEY=gn_...           # required
export GRAPHN_WORKSPACE_ID=ws_...      # required
export GRAPHN_BASE_URL=https://cp.graphn.ai      # optional
export GRAPHN_INFERENCE_URL=https://model.graphn.ai  # optional

Or pass them explicitly:

client = graphn.Client(api_key="gn_...", workspace_id="ws_...")

Get an API key from the Graphn dashboard.

Scope

What's in the box (v0.1.x)

Module What it does
client.custom_models Import models from HuggingFace, S3 presigned URLs, or S3 + IAM role; list, get, refresh, wake, delete, wait_until_ready, validate
client.secrets CRUD for workspace secrets (HuggingFace tokens, etc)
client.chat.completions OpenAI-compatible chat completions, streaming + non-streaming, with auto-wake on cold start
client.models List models served by the gateway
client.tts Text-to-speech: list voices, synthesize
client.imported_models Discover and probe BYO inference endpoints (read-only, no full CRUD yet)

Both graphn.Client and graphn.AsyncClient exist with identical APIs.

What's not in the box yet

The Graphn platform is broader than what's exposed here. The following surfaces exist on the platform but do not have SDK coverage in v0.1.x — file an issue on the SDK repo to vote on what you need next:

  • Agents — defining, running, and inspecting agent workflows
  • Knowledge bases / RAG — corpus management, retrieval, indexing
  • Workflows — long-running Temporal-backed orchestration
  • Evals & datasets — eval suites, dataset upload, run results
  • Guardrails — policy authoring and inference-time enforcement
  • Imported models (BYO inference) — full CRUD — only listing and probe are exposed today
  • Usage & billing — usage stats, GPU-hour reporting beyond the read-only client.custom_models.gpu_hours() helper
  • Workspace / member / API-key administration

Until they land here, those endpoints can be hit via raw HTTP using your gn_... API key. The control plane is documented at graphn.ai/docs/api and the OpenAPI 3.1 spec is mirrored at voltagepark/graphn-openapi.

The 80% recipe: import a model and chat with it

import graphn

with graphn.Client() as c:
    # 1. Import the model. Use a workspace secret for gated HF repos.
    model = c.custom_models.create(
        name="my-llama",
        huggingface_model_id="meta-llama/Llama-3.1-8B-Instruct",
        weight_source="huggingface",
        hf_token_secret_id="sec_...",  # optional, only for gated models
    )

    # 2. Wait for the deployment to be live.
    c.custom_models.wait_until_ready(model.id, timeout=1800)

    # 3. Chat. The first call will cold-start the model — the SDK
    #    transparently calls wake() and retries until it serves.
    resp = c.chat.completions.create(
        model=model.qualified_name,   # "custom:cm_..."
        messages=[{"role": "user", "content": "Tell me a joke."}],
        wake_timeout=600,             # max time to wait for cold start
    )
    print(resp.choices[0].message.content)

Importing from S3

If your weights aren't on HuggingFace — fine-tunes, internal models, licensed checkpoints — import them straight from S3. Two flavors:

Presigned URL (no AWS credentials shared with Graphn):

model = c.custom_models.create(
    name="my-finetune",
    weight_source="s3_presigned",
    s3_url="https://my-bucket.s3.amazonaws.com/llama-3.1-8b/?X-Amz-Algorithm=...",
    gpu_count=1,
)

Generate the URL with aws s3 presign s3://my-bucket/llama-3.1-8b/ or the AWS SDK; Graphn pulls weights through the URL on import. The URL only needs to be live for the import window, not for the model's lifetime.

IAM role assumption (for buckets you control, longer-lived credentials):

model = c.custom_models.create(
    name="my-finetune",
    weight_source="s3_assume_role",
    s3_url="s3://my-bucket/llama-3.1-8b/",
    s3_role_arn="arn:aws:iam::123456789012:role/GraphnImport",
    gpu_count=1,
)

The role's trust policy must allow Graphn's importer principal to sts:AssumeRole; ask support for the principal ARN to put in your trust policy. Graphn re-assumes on every import / refresh, so rotating credentials underneath is safe.

Everything past the create call — wait_until_ready, chat completions, auto-wake, qualified_name addressing — is identical regardless of weight source. See examples/import_from_s3.py for an end-to-end runnable script.

Streaming

stream = c.chat.completions.create(
    model=model.qualified_name,
    messages=[{"role": "user", "content": "Count to ten."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Async

import asyncio
import graphn

async def main() -> None:
    async with graphn.AsyncClient() as c:
        async for m in c.custom_models.list():
            print(m.id, m.name, m.status)

        resp = await c.chat.completions.create(
            model="custom:cm_abc123",
            messages=[{"role": "user", "content": "Hi!"}],
        )
        print(resp.choices[0].message.content)

asyncio.run(main())

Cold starts and auto-wake

Graphn custom models default to scale-to-zero: a model with no traffic for cooldown_seconds is descheduled, and the first request afterwards has to wait for the gateway to spin up a fresh replica (typically 60–600 seconds depending on weight size).

Without help, the first chat request after a cold period returns:

503 Service Unavailable: Model is scaled to zero and is now warming up.

The SDK detects this, calls POST /custom-models/{id}/wake to nudge the autoscaler, and retries with exponential backoff until the model serves or wake_timeout (default 180s) elapses. You don't have to do anything, but the knobs are there if you want them:

# Disable auto-wake — you handle the 503 yourself.
c.chat.completions.create(model=..., messages=[...], auto_wake=False)

# Give the warm-up more headroom (e.g. for large models).
c.chat.completions.create(model=..., messages=[...], wake_timeout=900)

See docs/cold-starts.md for the full story.

Drop-in for openai

The chat path is OpenAI-compatible all the way down — under the hood we delegate to the official openai Python SDK, configured against the Graphn gateway. So tools, structured outputs, multi-modal inputs, function calling, etc. all work out of the box.

If you already have OpenAI-shaped code and just want to point it at a Graphn model:

from openai import OpenAI

client = OpenAI(
    api_key="gn_...",
    base_url="https://model.graphn.ai/v1",
    default_headers={"X-Workspace-Id": "ws_..."},
)
resp = client.chat.completions.create(
    model="custom:cm_...",
    messages=[{"role": "user", "content": "Hello!"}],
)

The reason to use graphn.Client instead is everything around the chat call: lifecycle management, secrets, auto-wake, typed responses, and a stable URL contract.

More examples

See examples/ for runnable end-to-end scripts:

Configuration reference

Argument Default Notes
api_key $GRAPHN_API_KEY Bearer token starting with gn_. Required.
workspace_id $GRAPHN_WORKSPACE_ID Path parameter + X-Workspace-Id header. Required.
base_url https://cp.graphn.ai Control plane host.
inference_url https://model.graphn.ai Inference / OpenAI-compatible host.
timeout 60.0 Per-request HTTPX timeout (seconds).
max_retries 2 Retries on connect failures, 429, and 5xx.
default_headers {} Extra headers added to every request.

Generating clients in other languages

The OpenAPI 3.1 spec is the source of truth. It's published at:

Point your favorite generator at any of these. We test against openapi-generator 6.0+, openapi-python-client 0.21+, and oapi-codegen 2.0+.

Contributing

git clone https://github.com/voltagepark/graphn-sdk-python
cd graphn-sdk-python
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'

ruff check src tests
pytest

Regenerate the typed transport from the upstream spec after a spec change:

./scripts/regenerate.sh

See CHANGELOG.md for release notes.

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphn-0.1.2.tar.gz (51.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphn-0.1.2-py3-none-any.whl (104.6 kB view details)

Uploaded Python 3

File details

Details for the file graphn-0.1.2.tar.gz.

File metadata

  • Download URL: graphn-0.1.2.tar.gz
  • Upload date:
  • Size: 51.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graphn-0.1.2.tar.gz
Algorithm Hash digest
SHA256 52358a83aedc8a9c844457da4c00129d13072739bf5aed5b05f3f3c554d681b4
MD5 05072f23f42341e59ca4780782ef2f24
BLAKE2b-256 1d01a6d9a0aa297835fd5447325aaa150d1d0c2a76b705735e8608a81f693546

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphn-0.1.2.tar.gz:

Publisher: release.yml on voltagepark/graphn-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file graphn-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: graphn-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 104.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graphn-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3dfd396211136a007f17227d5f27d7841a191e16bbaef2ea4dfc0cfc4a9691c4
MD5 07b93a012b968655313e36a0f13a8d1e
BLAKE2b-256 23d8810a4228a2dd7dce4ade371486963663bbe9e06fae45d1efba061cc8ffc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphn-0.1.2-py3-none-any.whl:

Publisher: release.yml on voltagepark/graphn-sdk-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page