Source-available, self-hostable AI observability — scope every LLM call in production

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

v-scopecall

These details have not been verified by PyPI

Project links

Project description

scopecall

Python SDK for ScopeCall — source-available, self-hostable AI cost and workflow observability.

Wraps the OpenAI and Anthropic Python clients so every LLM call shows up in your ScopeCall dashboard with cost, latency, prompt-version, and workflow-tree attribution — without routing traffic through a proxy.

Install

pip install scopecall-py

# Or with provider extras (recommended — pins to a known-good lower bound):
pip install "scopecall-py[openai]"
pip install "scopecall-py[anthropic]"
pip install "scopecall-py[all]"

The PyPI package is named scopecall-py (Supabase-style language suffix); the Python import name stays just scopecall. So you pip install scopecall-py and then from scopecall import init.

Python 3.10+ required.

Quick start

import scopecall
from openai import OpenAI

# Initialize once at app startup.
sdk = scopecall.init(
    api_key="sc_live_xxx",                       # from your ScopeCall dashboard
    endpoint="http://localhost:8080/v1/ingest",  # required: self-hosted ingest URL
)

# Wrap the OpenAI client — every chat.completions.create call is now traced.
openai_client = sdk.instrument(OpenAI())

with sdk.trace("support-agent", user_id="user_123") as ctx:
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}],
    )

# Traces appear in your dashboard within seconds.

No hosted-Cloud default yet. A managed default endpoint will return when ScopeCall Cloud is live. Until then, init() requires endpoint to be set explicitly when using api_key — fail-fast is safer than silently sending events to a domain that doesn't exist.

Configuration

sdk = scopecall.init(
    api_key="sc_live_xxx",                       # required (or use debug=True / output=<path>)
    endpoint="http://localhost:8080/v1/ingest",  # required when using api_key
    environment="production",                    # optional; defaults to "production"
    capture_content=True,                        # optional; record prompts/completions (default True)
    redact_pii=True,                             # optional; PII redaction (default True)
    batch_size=50,                               # optional; events per HTTP batch
    max_retries=3,                               # optional; retry attempts on transient failure
    flush_interval=5.0,                          # optional; seconds between auto-flush
    debug=False,                                 # optional; route events to stdout instead of HTTP
)

Other transport modes:

# Console mode — pretty-prints events to stdout. Useful during integration.
sdk = scopecall.init(debug=True)

# File mode — appends NDJSON events to a path. Useful for offline capture.
sdk = scopecall.init(output="/var/log/scopecall.ndjson")

# Disabled mode — no-op SDK that swallows every call. Useful in tests.
sdk = scopecall.init(disabled=True)

Anthropic

import scopecall
import anthropic

sdk = scopecall.init(
    api_key="sc_live_xxx",
    endpoint="http://localhost:8080/v1/ingest",
)

anthropic_client = sdk.instrument(anthropic.Anthropic(), provider="anthropic")

msg = anthropic_client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

Streaming works the same way — pass stream=True and iterate. TTFT (time to first token) is captured automatically; output content is assembled from content_block_delta events; final token counts come from the message_delta event Anthropic emits near end-of-stream.

Async

Both AsyncOpenAI and AsyncAnthropic are first-class — instrument() auto-detects async vs sync from the client and wraps accordingly. No separate API.

import asyncio
import scopecall
from openai import AsyncOpenAI

sdk = scopecall.init(
    api_key="sc_live_xxx",
    endpoint="http://localhost:8080/v1/ingest",
)
client = sdk.instrument(AsyncOpenAI())

async def main():
    # Use asyncio.gather so this snippet runs on Python 3.10 (the SDK's
    # lower bound). asyncio.TaskGroup is 3.11+; if you're on 3.11 or
    # later it's a cleaner choice for structured concurrency.
    await asyncio.gather(*(
        client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": f"Hello {i}"}],
        )
        for i in range(3)
    ))

asyncio.run(main())

contextvars propagate the active sdk.trace() context across await and asyncio.create_task(), so concurrent calls inside the same trace get the right parent_span_id automatically.

Workflow tracing

The sdk.trace(name) block emits a synthetic workflow span when it exits, so the ScopeCall dashboard can render the parent → child structure of multi-call agents:

with sdk.trace("rag-question", user_id=user_id, session_id=session_id):
    # 1) retrieve documents (could itself be an LLM call)
    docs = retriever.retrieve(question)

    # 2) call the LLM with the retrieved context
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Context:\n{docs}"},
            {"role": "user", "content": question},
        ],
    )

In the dashboard's trace tree, that block renders as:

rag-question                          (workflow span)
└── chat.completions.create           (LLM span)

Nested traces work too — the inner block inherits trace_id, gets its own span_id, and sets parent_span_id to the outer block.

Streaming + workflow latency

When a streaming response is iterated AFTER the enclosing sdk.trace() block has exited (the common pattern with FastAPI's StreamingResponse, where the route handler returns and the iterator runs later), the SDK still attaches the child LLM event to the workflow span correctly — context is snapshotted when .create() is called, not when the stream is consumed.

But the workflow span's latency only covers what's inside the with block. If you want workflow latency to reflect the full streaming duration, keep the trace block open across the iteration:

async def event_source():
    with sdk.trace("chat-api", user_id=req.user_id):
        stream = await openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            stream=True,
        )
        async for chunk in stream:
            yield chunk

return StreamingResponse(event_source(), media_type="text/event-stream")

The runnable FastAPI example below uses exactly this shape.

Per-call metadata

Set defaults SDK-wide on init(), then override per-trace:

sdk = scopecall.init(
    api_key="sc_live_xxx",
    endpoint="http://localhost:8080/v1/ingest",
    default_feature="chat",                       # every call tagged "chat"
    default_user_id="anonymous",
    default_prompt_version=os.getenv("DEPLOY_SHA"),  # auto-tag with commit hash
)

# Per-call overrides win over defaults; nested-trace inheritance fills
# the gap for prompt_version (trace > parent > default > None).
with sdk.trace("billing-agent", user_id=user.id, prompt_version="refund-v3"):
    ...

Prompt-version tracking

Tag each sdk.trace() with a prompt_version. The ScopeCall Prompts page surfaces cost / latency / error-rate per version — ship a new prompt, see whether output tokens went up:

PROMPT_V = "refund-policy-v7"

with sdk.trace("support-agent", prompt_version=PROMPT_V):
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": PROMPT_V_TEXT},
            {"role": "user", "content": question},
        ],
    )

Nested traces inherit the parent's prompt_version. To clear it on a child span, pass prompt_version=None explicitly (which doesn't override; you'd want a different scope name instead).

Manual instrumentation (LangChain, LlamaIndex, custom)

If you're calling an LLM through a framework that wraps the underlying client (LangChain, LlamaIndex, CrewAI, your own gateway), instrument() can't see through to the raw call. Use sdk.record_llm_call() to emit events manually — same wire format, same trace-context chaining:

with sdk.trace("rag-answer"):
    docs = retriever.retrieve(q)         # your code, not instrumented

    # ... call your custom LLM wrapper ...
    sdk.record_llm_call(
        model="gpt-4o-mini",
        provider="openai",
        input_tokens=1234,
        output_tokens=567,
        latency_ms=842,
        input_text=prompt,
        output_text=answer,
        finish_reason="stop",
    )

record_llm_call reads the current sdk.trace() context to set parent_span_id and inherit feature / user / session / prompt_version. PII redaction (redact_pii=True) applies to manual calls too — input and output run through the same scrubber the auto-instrumented path uses.

For deeper sub-step instrumentation (e.g. "retrieve" and "rerank" as separate visible spans), nest sdk.trace() blocks rather than reaching for a sub-span helper. Each nested trace block emits its own workflow span and chains correctly:

with sdk.trace("rag-answer"):
    with sdk.trace("retrieve"):
        docs = retriever.retrieve(q)
    with sdk.trace("generate"):
        sdk.record_llm_call(...)

FastAPI

from contextlib import asynccontextmanager

import scopecall
from fastapi import FastAPI
from openai import AsyncOpenAI

sdk: scopecall.ScopeCallSDK
client: AsyncOpenAI


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Initialize the SDK once at startup; close on shutdown so the
    background flush thread drains pending events before exit."""
    global sdk, client
    sdk = scopecall.init(
        api_key=os.environ["SCOPECALL_API_KEY"],
        endpoint=os.environ.get(
            "SCOPECALL_ENDPOINT", "http://localhost:8080/v1/ingest"
        ),
        environment=os.environ.get("ENV", "production"),
        default_prompt_version=os.environ.get("DEPLOY_SHA"),
    )
    client = sdk.instrument(AsyncOpenAI())
    yield
    sdk.close(timeout=5.0)


app = FastAPI(lifespan=lifespan)


@app.post("/chat")
async def chat(req: ChatRequest):
    with sdk.trace("chat-api", user_id=req.user_id, session_id=req.session_id):
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=req.messages,
        )
        return {"reply": response.choices[0].message.content}

A runnable version of this example lives in examples/fastapi/.

What gets captured

Every traced LLM call captures:

Field	Description
`model`	Canonical model name (e.g. `gpt-4o-mini`, `claude-3-5-sonnet-20241022`)
`provider`	`openai` or `anthropic`
`input_tokens`	Prompt token count
`output_tokens`	Completion token count
`cache_read_tokens`	OpenAI prompt cache hits / Anthropic `cache_read_input_tokens`
`cost_usd`	Computed server-side from the bundled pricing table
`latency_ms`	End-to-end latency
`ttft_ms`	Time to first token (streaming only)
`finish_reason`	`stop` / `length` / `tool_calls` / `end_turn` (Anthropic)
`status`	`success` / `error` / `timeout` / `rate_limited`
`error_message`	Error detail on failure
`input_text`	Full prompt (redacted per your PII config)
`output_text`	Full completion
`tool_calls`	Tool-use blocks as JSON (Anthropic)
`prompt_version`	Per-trace label from `sdk.trace()` or config — powers the Prompts page
`feature_name` / `user_id` / `session_id`	From `sdk.trace()` or `init()` defaults
`kind`	`llm` for provider calls, `workflow` for `sdk.trace()` blocks

PII redaction

When redact_pii=True (the default), input_text and output_text pass through a regex-based scrubber before leaving the process. The same scrubber runs on auto-instrumented chat.completions.create / messages.create calls AND on manual sdk.record_llm_call(...) — the policy is the same regardless of how the event was generated.

Pattern	Replacement
Email	`[EMAIL]`
Credit card (Luhn-validated)	`[CARD]`
SSN	`[SSN]`
IPv4	`[IP]`
Phone	`[PHONE]`

Add custom patterns via the public helper on the SDK:

sdk.add_redaction_pattern(
    "UUID",
    r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b",
)

To disable redaction entirely (rarely a good idea outside dev), pass redact_pii=False.

Providers

Provider	Status
OpenAI (`chat.completions.create`) — sync + async + streaming	✅ v0.2.0
Anthropic (`messages.create`) — sync + async + streaming	✅ v0.2.0
Google Gemini	🔜 v0.3
LangChain (via manual API today; native bridge planned)	🔜 v0.3
LlamaIndex (via manual API today)	🔜 v0.3

For unsupported providers / frameworks, use sdk.record_llm_call(...) to emit events directly — the wire format is the same.

Migrating from `scopecall` v0.1.x

v0.1 used module-level globals (scopecall.init() then scopecall.trace(...)). v0.2 returns an instance from init().

The two changes most likely to break callers:

# v0.1 (old)
scopecall.init(api_key="...")               # module-level
with scopecall.trace(feature="x"):
    ...

# v0.2 (new)
sdk = scopecall.init(api_key="...",                # endpoint REQUIRED now
                     endpoint="http://localhost:8080/v1/ingest")
with sdk.trace("x"):                               # name is positional
    ...

Other notable changes:

endpoint is required when api_key is set (no silent default to https://ingest.scopecall.com because Cloud isn't live yet).
Removed dependency on Traceloop / OpenLLMetry.
Native OpenAI + Anthropic instrumentation (sync + async + streaming) via sdk.instrument(client).
New manual API: sdk.record_llm_call(...) and sdk.add_redaction_pattern(name, regex).
LLMEvent wire format adds kind, prompt_version, input_cost_usd, output_cost_usd, finish_reason, cache_read_tokens, tool_calls, and others to match the TS SDK parity contract.

Self-hosted setup

See the main repo README for the full Docker Compose quickstart that brings up the Rust ingest, Rust processor, ClickHouse, Postgres, Redpanda, Go API, and Next.js dashboard.

License

BUSL-1.1 — free for any internal use; not for resale as a managed service. Converts to Apache 2.0 on May 26, 2031.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

v-scopecall

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Jun 4, 2026

This version

0.2.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scopecall_py-0.2.0.tar.gz (52.8 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scopecall_py-0.2.0-py3-none-any.whl (45.6 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file scopecall_py-0.2.0.tar.gz.

File metadata

Download URL: scopecall_py-0.2.0.tar.gz
Upload date: Jun 3, 2026
Size: 52.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scopecall_py-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1c64463317250adb681fe9f2ff77f2c9e032b38c63e693621d08eb753d6a674f`
MD5	`bb8b6b83c25725daf1cc8e890661dec1`
BLAKE2b-256	`67725b6c6a56f74385c7a620d6285d13a9b0d33c8440ea4a39127e364cc22296`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scopecall_py-0.2.0.tar.gz:

Publisher: publish-python.yml on scopecall/scopecall

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scopecall_py-0.2.0.tar.gz
- Subject digest: 1c64463317250adb681fe9f2ff77f2c9e032b38c63e693621d08eb753d6a674f
- Sigstore transparency entry: 1710508814
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: scopecall/scopecall@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b
- Branch / Tag: refs/tags/python-v0.2.0
- Owner: https://github.com/scopecall
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-python.yml@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b
- Trigger Event: push

File details

Details for the file scopecall_py-0.2.0-py3-none-any.whl.

File metadata

Download URL: scopecall_py-0.2.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 45.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scopecall_py-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`32468774b4b5bf95df0d8d6873c12ba8fc3355ee8f0c5747e2b2e13e6716f6dd`
MD5	`9fd7eece479811a04e01decf5bc68aad`
BLAKE2b-256	`e8a63d1d4fd89ee3056a245c7f0cbb18f131804dc62a8c8c6c268efc97532313`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scopecall_py-0.2.0-py3-none-any.whl:

Publisher: publish-python.yml on scopecall/scopecall

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scopecall_py-0.2.0-py3-none-any.whl
- Subject digest: 32468774b4b5bf95df0d8d6873c12ba8fc3355ee8f0c5747e2b2e13e6716f6dd
- Sigstore transparency entry: 1710508825
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: scopecall/scopecall@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b
- Branch / Tag: refs/tags/python-v0.2.0
- Owner: https://github.com/scopecall
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-python.yml@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b
- Trigger Event: push

scopecall-py 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scopecall

Install

Quick start

Configuration

Anthropic

Async

Workflow tracing

Streaming + workflow latency

Per-call metadata

Prompt-version tracking

Manual instrumentation (LangChain, LlamaIndex, custom)

FastAPI

What gets captured

PII redaction

Providers

Migrating from scopecall v0.1.x

Self-hosted setup

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Migrating from `scopecall` v0.1.x