Skip to main content

Source-available, self-hostable AI observability — scope every LLM call in production

Project description

scopecall

Python SDK for ScopeCall — source-available, self-hostable AI cost and workflow observability.

PyPI License Python

Wraps the OpenAI and Anthropic Python clients so every LLM call shows up in your ScopeCall dashboard with cost, latency, prompt-version, and workflow-tree attribution — without routing traffic through a proxy.


Install

pip install scopecall-py

# Or with provider extras (recommended — pins to a known-good lower bound):
pip install "scopecall-py[openai]"
pip install "scopecall-py[anthropic]"
pip install "scopecall-py[all]"

The PyPI package is named scopecall-py (Supabase-style language suffix); the Python import name stays just scopecall. So you pip install scopecall-py and then from scopecall import init.

Python 3.10+ required.


Quick start

import scopecall
from openai import OpenAI

# Initialize once at app startup.
sdk = scopecall.init(
    api_key="sc_live_xxx",                       # from your ScopeCall dashboard
    endpoint="http://localhost:8080/v1/ingest",  # required: self-hosted ingest URL
)

# Wrap the OpenAI client — every chat.completions.create call is now traced.
openai_client = sdk.instrument(OpenAI())

with sdk.trace("support-agent", user_id="user_123") as ctx:
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
    )

# Traces appear in your dashboard within seconds.

No hosted-Cloud default yet. A managed default endpoint will return when ScopeCall Cloud is live. Until then, init() requires endpoint to be set explicitly when using api_key — fail-fast is safer than silently sending events to a domain that doesn't exist.


Configuration

sdk = scopecall.init(
    api_key="sc_live_xxx",                       # required (or use debug=True / output=<path>)
    endpoint="http://localhost:8080/v1/ingest",  # required when using api_key
    environment="production",                    # optional; defaults to "production"
    capture_content=True,                        # optional; record prompts/completions (default True)
    redact_pii=True,                             # optional; PII redaction (default True)
    batch_size=50,                               # optional; events per HTTP batch
    max_retries=3,                               # optional; retry attempts on transient failure
    flush_interval=5.0,                          # optional; seconds between auto-flush
    debug=False,                                 # optional; route events to stdout instead of HTTP
)

Other transport modes:

# Console mode — pretty-prints events to stdout. Useful during integration.
sdk = scopecall.init(debug=True)

# File mode — appends NDJSON events to a path. Useful for offline capture.
sdk = scopecall.init(output="/var/log/scopecall.ndjson")

# Disabled mode — no-op SDK that swallows every call. Useful in tests.
sdk = scopecall.init(disabled=True)

Anthropic

import scopecall
import anthropic

sdk = scopecall.init(
    api_key="sc_live_xxx",
    endpoint="http://localhost:8080/v1/ingest",
)

anthropic_client = sdk.instrument(anthropic.Anthropic(), provider="anthropic")

msg = anthropic_client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

Streaming works the same way — pass stream=True and iterate. TTFT (time to first token) is captured automatically; output content is assembled from content_block_delta events; final token counts come from the message_delta event Anthropic emits near end-of-stream.


Async

Both AsyncOpenAI and AsyncAnthropic are first-class — instrument() auto-detects async vs sync from the client and wraps accordingly. No separate API.

import asyncio
import scopecall
from openai import AsyncOpenAI

sdk = scopecall.init(
    api_key="sc_live_xxx",
    endpoint="http://localhost:8080/v1/ingest",
)
client = sdk.instrument(AsyncOpenAI())

async def main():
    # Use asyncio.gather so this snippet runs on Python 3.10 (the SDK's
    # lower bound). asyncio.TaskGroup is 3.11+; if you're on 3.11 or
    # later it's a cleaner choice for structured concurrency.
    await asyncio.gather(*(
        client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": f"Hello {i}"}],
        )
        for i in range(3)
    ))

asyncio.run(main())

contextvars propagate the active sdk.trace() context across await and asyncio.create_task(), so concurrent calls inside the same trace get the right parent_span_id automatically.


Workflow tracing

The sdk.trace(name) block emits a synthetic workflow span when it exits, so the ScopeCall dashboard can render the parent → child structure of multi-call agents:

with sdk.trace("rag-question", user_id=user_id, session_id=session_id):
    # 1) retrieve documents (could itself be an LLM call)
    docs = retriever.retrieve(question)

    # 2) call the LLM with the retrieved context
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Context:\n{docs}"},
            {"role": "user", "content": question},
        ],
    )

In the dashboard's trace tree, that block renders as:

rag-question                          (workflow span)
└── chat.completions.create           (LLM span)

Nested traces work too — the inner block inherits trace_id, gets its own span_id, and sets parent_span_id to the outer block.

Streaming + workflow latency

When a streaming response is iterated AFTER the enclosing sdk.trace() block has exited (the common pattern with FastAPI's StreamingResponse, where the route handler returns and the iterator runs later), the SDK still attaches the child LLM event to the workflow span correctly — context is snapshotted when .create() is called, not when the stream is consumed.

But the workflow span's latency only covers what's inside the with block. If you want workflow latency to reflect the full streaming duration, keep the trace block open across the iteration:

async def event_source():
    with sdk.trace("chat-api", user_id=req.user_id):
        stream = await openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            stream=True,
        )
        async for chunk in stream:
            yield chunk

return StreamingResponse(event_source(), media_type="text/event-stream")

The runnable FastAPI example below uses exactly this shape.


Per-call metadata

Set defaults SDK-wide on init(), then override per-trace:

sdk = scopecall.init(
    api_key="sc_live_xxx",
    endpoint="http://localhost:8080/v1/ingest",
    default_feature="chat",                       # every call tagged "chat"
    default_user_id="anonymous",
    default_prompt_version=os.getenv("DEPLOY_SHA"),  # auto-tag with commit hash
)

# Per-call overrides win over defaults; nested-trace inheritance fills
# the gap for prompt_version (trace > parent > default > None).
with sdk.trace("billing-agent", user_id=user.id, prompt_version="refund-v3"):
    ...

Prompt-version tracking

Tag each sdk.trace() with a prompt_version. The ScopeCall Prompts page surfaces cost / latency / error-rate per version — ship a new prompt, see whether output tokens went up:

PROMPT_V = "refund-policy-v7"

with sdk.trace("support-agent", prompt_version=PROMPT_V):
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": PROMPT_V_TEXT},
            {"role": "user", "content": question},
        ],
    )

Nested traces inherit the parent's prompt_version. To clear it on a child span, pass prompt_version=None explicitly (which doesn't override; you'd want a different scope name instead).


Manual instrumentation (LangChain, LlamaIndex, custom)

If you're calling an LLM through a framework that wraps the underlying client (LangChain, LlamaIndex, CrewAI, your own gateway), instrument() can't see through to the raw call. Use sdk.record_llm_call() to emit events manually — same wire format, same trace-context chaining:

with sdk.trace("rag-answer"):
    docs = retriever.retrieve(q)         # your code, not instrumented

    # ... call your custom LLM wrapper ...
    sdk.record_llm_call(
        model="gpt-4o-mini",
        provider="openai",
        input_tokens=1234,
        output_tokens=567,
        latency_ms=842,
        input_text=prompt,
        output_text=answer,
        finish_reason="stop",
    )

record_llm_call reads the current sdk.trace() context to set parent_span_id and inherit feature / user / session / prompt_version. PII redaction (redact_pii=True) applies to manual calls too — input and output run through the same scrubber the auto-instrumented path uses.

For deeper sub-step instrumentation (e.g. "retrieve" and "rerank" as separate visible spans), nest sdk.trace() blocks rather than reaching for a sub-span helper. Each nested trace block emits its own workflow span and chains correctly:

with sdk.trace("rag-answer"):
    with sdk.trace("retrieve"):
        docs = retriever.retrieve(q)
    with sdk.trace("generate"):
        sdk.record_llm_call(...)

FastAPI

from contextlib import asynccontextmanager

import scopecall
from fastapi import FastAPI
from openai import AsyncOpenAI

sdk: scopecall.ScopeCallSDK
client: AsyncOpenAI


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Initialize the SDK once at startup; close on shutdown so the
    background flush thread drains pending events before exit."""
    global sdk, client
    sdk = scopecall.init(
        api_key=os.environ["SCOPECALL_API_KEY"],
        endpoint=os.environ.get(
            "SCOPECALL_ENDPOINT", "http://localhost:8080/v1/ingest"
        ),
        environment=os.environ.get("ENV", "production"),
        default_prompt_version=os.environ.get("DEPLOY_SHA"),
    )
    client = sdk.instrument(AsyncOpenAI())
    yield
    sdk.close(timeout=5.0)


app = FastAPI(lifespan=lifespan)


@app.post("/chat")
async def chat(req: ChatRequest):
    with sdk.trace("chat-api", user_id=req.user_id, session_id=req.session_id):
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=req.messages,
        )
        return {"reply": response.choices[0].message.content}

A runnable version of this example lives in examples/fastapi/.


What gets captured

Every traced LLM call captures:

Field Description
model Canonical model name (e.g. gpt-4o-mini, claude-3-5-sonnet-20241022)
provider openai or anthropic
input_tokens Prompt token count
output_tokens Completion token count
cache_read_tokens OpenAI prompt cache hits / Anthropic cache_read_input_tokens
cost_usd Computed server-side from the bundled pricing table
latency_ms End-to-end latency
ttft_ms Time to first token (streaming only)
finish_reason stop / length / tool_calls / end_turn (Anthropic)
status success / error / timeout / rate_limited
error_message Error detail on failure
input_text Full prompt (redacted per your PII config)
output_text Full completion
tool_calls Tool-use blocks as JSON (Anthropic)
prompt_version Per-trace label from sdk.trace() or config — powers the Prompts page
feature_name / user_id / session_id From sdk.trace() or init() defaults
kind llm for provider calls, workflow for sdk.trace() blocks

PII redaction

When redact_pii=True (the default), input_text and output_text pass through a regex-based scrubber before leaving the process. The same scrubber runs on auto-instrumented chat.completions.create / messages.create calls AND on manual sdk.record_llm_call(...) — the policy is the same regardless of how the event was generated.

Pattern Replacement
Email [EMAIL]
Credit card (Luhn-validated) [CARD]
SSN [SSN]
IPv4 [IP]
Phone [PHONE]

Add custom patterns via the public helper on the SDK:

sdk.add_redaction_pattern(
    "UUID",
    r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b",
)

To disable redaction entirely (rarely a good idea outside dev), pass redact_pii=False.


Providers

Provider Status
OpenAI (chat.completions.create) — sync + async + streaming ✅ v0.2.0
Anthropic (messages.create) — sync + async + streaming ✅ v0.2.0
Google Gemini 🔜 v0.3
LangChain (via manual API today; native bridge planned) 🔜 v0.3
LlamaIndex (via manual API today) 🔜 v0.3

For unsupported providers / frameworks, use sdk.record_llm_call(...) to emit events directly — the wire format is the same.


Migrating from scopecall v0.1.x

v0.1 used module-level globals (scopecall.init() then scopecall.trace(...)). v0.2 returns an instance from init().

The two changes most likely to break callers:

# v0.1 (old)
scopecall.init(api_key="...")               # module-level
with scopecall.trace(feature="x"):
    ...

# v0.2 (new)
sdk = scopecall.init(api_key="...",                # endpoint REQUIRED now
                     endpoint="http://localhost:8080/v1/ingest")
with sdk.trace("x"):                               # name is positional
    ...

Other notable changes:

  • endpoint is required when api_key is set (no silent default to https://ingest.scopecall.com because Cloud isn't live yet).
  • Removed dependency on Traceloop / OpenLLMetry.
  • Native OpenAI + Anthropic instrumentation (sync + async + streaming) via sdk.instrument(client).
  • New manual API: sdk.record_llm_call(...) and sdk.add_redaction_pattern(name, regex).
  • LLMEvent wire format adds kind, prompt_version, input_cost_usd, output_cost_usd, finish_reason, cache_read_tokens, tool_calls, and others to match the TS SDK parity contract.

Self-hosted setup

See the main repo README for the full Docker Compose quickstart that brings up the Rust ingest, Rust processor, ClickHouse, Postgres, Redpanda, Go API, and Next.js dashboard.


License

BUSL-1.1 — free for any internal use; not for resale as a managed service. Converts to Apache 2.0 on May 26, 2031.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scopecall_py-0.2.0.tar.gz (52.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scopecall_py-0.2.0-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file scopecall_py-0.2.0.tar.gz.

File metadata

  • Download URL: scopecall_py-0.2.0.tar.gz
  • Upload date:
  • Size: 52.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scopecall_py-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1c64463317250adb681fe9f2ff77f2c9e032b38c63e693621d08eb753d6a674f
MD5 bb8b6b83c25725daf1cc8e890661dec1
BLAKE2b-256 67725b6c6a56f74385c7a620d6285d13a9b0d33c8440ea4a39127e364cc22296

See more details on using hashes here.

Provenance

The following attestation bundles were made for scopecall_py-0.2.0.tar.gz:

Publisher: publish-python.yml on scopecall/scopecall

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scopecall_py-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: scopecall_py-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scopecall_py-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32468774b4b5bf95df0d8d6873c12ba8fc3355ee8f0c5747e2b2e13e6716f6dd
MD5 9fd7eece479811a04e01decf5bc68aad
BLAKE2b-256 e8a63d1d4fd89ee3056a245c7f0cbb18f131804dc62a8c8c6c268efc97532313

See more details on using hashes here.

Provenance

The following attestation bundles were made for scopecall_py-0.2.0-py3-none-any.whl:

Publisher: publish-python.yml on scopecall/scopecall

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page