Source-available, self-hostable AI observability — scope every LLM call in production
Project description
scopecall
Python SDK for ScopeCall — source-available, self-hostable AI cost and workflow observability.
Wraps the OpenAI and Anthropic Python clients so every LLM call shows up in your ScopeCall dashboard with cost, latency, prompt-version, and workflow-tree attribution — without routing traffic through a proxy.
Install
pip install scopecall-py
# Or with provider extras (recommended — pins to a known-good lower bound):
pip install "scopecall-py[openai]"
pip install "scopecall-py[anthropic]"
pip install "scopecall-py[all]"
The PyPI package is named scopecall-py (Supabase-style language
suffix); the Python import name stays just scopecall. So you pip install scopecall-py and then from scopecall import init.
Python 3.10+ required.
Quick start
import scopecall
from openai import OpenAI
# Initialize once at app startup.
sdk = scopecall.init(
api_key="sc_live_xxx", # from your ScopeCall dashboard
endpoint="http://localhost:8080/v1/ingest", # required: self-hosted ingest URL
)
# Wrap the OpenAI client — every chat.completions.create call is now traced.
openai_client = sdk.instrument(OpenAI())
with sdk.trace("support-agent", user_id="user_123") as ctx:
response = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
# Traces appear in your dashboard within seconds.
No hosted-Cloud default yet. A managed default endpoint will return when ScopeCall Cloud is live. Until then,
init()requiresendpointto be set explicitly when usingapi_key— fail-fast is safer than silently sending events to a domain that doesn't exist.
Configuration
sdk = scopecall.init(
api_key="sc_live_xxx", # required (or use debug=True / output=<path>)
endpoint="http://localhost:8080/v1/ingest", # required when using api_key
environment="production", # optional; defaults to "production"
capture_content=True, # optional; record prompts/completions (default True)
redact_pii=True, # optional; PII redaction (default True)
batch_size=50, # optional; events per HTTP batch
max_retries=3, # optional; retry attempts on transient failure
flush_interval=5.0, # optional; seconds between auto-flush
debug=False, # optional; route events to stdout instead of HTTP
)
Other transport modes:
# Console mode — pretty-prints events to stdout. Useful during integration.
sdk = scopecall.init(debug=True)
# File mode — appends NDJSON events to a path. Useful for offline capture.
sdk = scopecall.init(output="/var/log/scopecall.ndjson")
# Disabled mode — no-op SDK that swallows every call. Useful in tests.
sdk = scopecall.init(disabled=True)
Anthropic
import scopecall
import anthropic
sdk = scopecall.init(
api_key="sc_live_xxx",
endpoint="http://localhost:8080/v1/ingest",
)
anthropic_client = sdk.instrument(anthropic.Anthropic(), provider="anthropic")
msg = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)
Streaming works the same way — pass stream=True and iterate. TTFT
(time to first token) is captured automatically; output content is
assembled from content_block_delta events; final token counts come
from the message_delta event Anthropic emits near end-of-stream.
Async
Both AsyncOpenAI and AsyncAnthropic are first-class — instrument()
auto-detects async vs sync from the client and wraps accordingly. No
separate API.
import asyncio
import scopecall
from openai import AsyncOpenAI
sdk = scopecall.init(
api_key="sc_live_xxx",
endpoint="http://localhost:8080/v1/ingest",
)
client = sdk.instrument(AsyncOpenAI())
async def main():
# Use asyncio.gather so this snippet runs on Python 3.10 (the SDK's
# lower bound). asyncio.TaskGroup is 3.11+; if you're on 3.11 or
# later it's a cleaner choice for structured concurrency.
await asyncio.gather(*(
client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Hello {i}"}],
)
for i in range(3)
))
asyncio.run(main())
contextvars propagate the active sdk.trace() context across
await and asyncio.create_task(), so concurrent calls inside the
same trace get the right parent_span_id automatically.
Workflow tracing
The sdk.trace(name) block emits a synthetic workflow span when it
exits, so the ScopeCall dashboard can render the parent → child
structure of multi-call agents:
with sdk.trace("rag-question", user_id=user_id, session_id=session_id):
# 1) retrieve documents (could itself be an LLM call)
docs = retriever.retrieve(question)
# 2) call the LLM with the retrieved context
response = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"Context:\n{docs}"},
{"role": "user", "content": question},
],
)
In the dashboard's trace tree, that block renders as:
rag-question (workflow span)
└── chat.completions.create (LLM span)
Nested traces work too — the inner block inherits trace_id,
gets its own span_id, and sets parent_span_id to the outer block.
Streaming + workflow latency
When a streaming response is iterated AFTER the enclosing
sdk.trace() block has exited (the common pattern with FastAPI's
StreamingResponse, where the route handler returns and the iterator
runs later), the SDK still attaches the child LLM event to the
workflow span correctly — context is snapshotted when
.create() is called, not when the stream is consumed.
But the workflow span's latency only covers what's inside the
with block. If you want workflow latency to reflect the full
streaming duration, keep the trace block open across the iteration:
async def event_source():
with sdk.trace("chat-api", user_id=req.user_id):
stream = await openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True,
)
async for chunk in stream:
yield chunk
return StreamingResponse(event_source(), media_type="text/event-stream")
The runnable FastAPI example below uses exactly this shape.
Per-call metadata
Set defaults SDK-wide on init(), then override per-trace:
sdk = scopecall.init(
api_key="sc_live_xxx",
endpoint="http://localhost:8080/v1/ingest",
default_feature="chat", # every call tagged "chat"
default_user_id="anonymous",
default_prompt_version=os.getenv("DEPLOY_SHA"), # auto-tag with commit hash
)
# Per-call overrides win over defaults; nested-trace inheritance fills
# the gap for prompt_version (trace > parent > default > None).
with sdk.trace("billing-agent", user_id=user.id, prompt_version="refund-v3"):
...
Prompt-version tracking
Tag each sdk.trace() with a prompt_version. The ScopeCall Prompts
page surfaces cost / latency / error-rate per version — ship a new
prompt, see whether output tokens went up:
PROMPT_V = "refund-policy-v7"
with sdk.trace("support-agent", prompt_version=PROMPT_V):
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": PROMPT_V_TEXT},
{"role": "user", "content": question},
],
)
Nested traces inherit the parent's prompt_version. To clear it on a
child span, pass prompt_version=None explicitly (which doesn't
override; you'd want a different scope name instead).
Manual instrumentation (LangChain, LlamaIndex, custom)
If you're calling an LLM through a framework that wraps the underlying
client (LangChain, LlamaIndex, CrewAI, your own gateway), instrument()
can't see through to the raw call. Use sdk.record_llm_call() to emit
events manually — same wire format, same trace-context chaining:
with sdk.trace("rag-answer"):
docs = retriever.retrieve(q) # your code, not instrumented
# ... call your custom LLM wrapper ...
sdk.record_llm_call(
model="gpt-4o-mini",
provider="openai",
input_tokens=1234,
output_tokens=567,
latency_ms=842,
input_text=prompt,
output_text=answer,
finish_reason="stop",
)
record_llm_call reads the current sdk.trace() context to set
parent_span_id and inherit feature / user / session / prompt_version.
PII redaction (redact_pii=True) applies to manual calls too — input
and output run through the same scrubber the auto-instrumented path
uses.
For deeper sub-step instrumentation (e.g. "retrieve" and "rerank" as
separate visible spans), nest sdk.trace() blocks rather than reaching
for a sub-span helper. Each nested trace block emits its own
workflow span and chains correctly:
with sdk.trace("rag-answer"):
with sdk.trace("retrieve"):
docs = retriever.retrieve(q)
with sdk.trace("generate"):
sdk.record_llm_call(...)
FastAPI
from contextlib import asynccontextmanager
import scopecall
from fastapi import FastAPI
from openai import AsyncOpenAI
sdk: scopecall.ScopeCallSDK
client: AsyncOpenAI
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Initialize the SDK once at startup; close on shutdown so the
background flush thread drains pending events before exit."""
global sdk, client
sdk = scopecall.init(
api_key=os.environ["SCOPECALL_API_KEY"],
endpoint=os.environ.get(
"SCOPECALL_ENDPOINT", "http://localhost:8080/v1/ingest"
),
environment=os.environ.get("ENV", "production"),
default_prompt_version=os.environ.get("DEPLOY_SHA"),
)
client = sdk.instrument(AsyncOpenAI())
yield
sdk.close(timeout=5.0)
app = FastAPI(lifespan=lifespan)
@app.post("/chat")
async def chat(req: ChatRequest):
with sdk.trace("chat-api", user_id=req.user_id, session_id=req.session_id):
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=req.messages,
)
return {"reply": response.choices[0].message.content}
A runnable version of this example lives in
examples/fastapi/.
What gets captured
Every traced LLM call captures:
| Field | Description |
|---|---|
model |
Canonical model name (e.g. gpt-4o-mini, claude-3-5-sonnet-20241022) |
provider |
openai or anthropic |
input_tokens |
Prompt token count |
output_tokens |
Completion token count |
cache_read_tokens |
OpenAI prompt cache hits / Anthropic cache_read_input_tokens |
cost_usd |
Computed server-side from the bundled pricing table |
latency_ms |
End-to-end latency |
ttft_ms |
Time to first token (streaming only) |
finish_reason |
stop / length / tool_calls / end_turn (Anthropic) |
status |
success / error / timeout / rate_limited |
error_message |
Error detail on failure |
input_text |
Full prompt (redacted per your PII config) |
output_text |
Full completion |
tool_calls |
Tool-use blocks as JSON (Anthropic) |
prompt_version |
Per-trace label from sdk.trace() or config — powers the Prompts page |
feature_name / user_id / session_id |
From sdk.trace() or init() defaults |
kind |
llm for provider calls, workflow for sdk.trace() blocks |
PII redaction
When redact_pii=True (the default), input_text and output_text
pass through a regex-based scrubber before leaving the process. The
same scrubber runs on auto-instrumented chat.completions.create /
messages.create calls AND on manual sdk.record_llm_call(...) —
the policy is the same regardless of how the event was generated.
| Pattern | Replacement |
|---|---|
[EMAIL] |
|
| Credit card (Luhn-validated) | [CARD] |
| SSN | [SSN] |
| IPv4 | [IP] |
| Phone | [PHONE] |
Add custom patterns via the public helper on the SDK:
sdk.add_redaction_pattern(
"UUID",
r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b",
)
To disable redaction entirely (rarely a good idea outside dev), pass
redact_pii=False.
Providers
| Provider | Status |
|---|---|
OpenAI (chat.completions.create) — sync + async + streaming |
✅ v0.2.0 |
Anthropic (messages.create) — sync + async + streaming |
✅ v0.2.0 |
| Google Gemini | 🔜 v0.3 |
| LangChain (via manual API today; native bridge planned) | 🔜 v0.3 |
| LlamaIndex (via manual API today) | 🔜 v0.3 |
For unsupported providers / frameworks, use sdk.record_llm_call(...)
to emit events directly — the wire format is the same.
Migrating from scopecall v0.1.x
v0.1 used module-level globals (scopecall.init() then
scopecall.trace(...)). v0.2 returns an instance from init().
The two changes most likely to break callers:
# v0.1 (old)
scopecall.init(api_key="...") # module-level
with scopecall.trace(feature="x"):
...
# v0.2 (new)
sdk = scopecall.init(api_key="...", # endpoint REQUIRED now
endpoint="http://localhost:8080/v1/ingest")
with sdk.trace("x"): # name is positional
...
Other notable changes:
endpointis required whenapi_keyis set (no silent default tohttps://ingest.scopecall.combecause Cloud isn't live yet).- Removed dependency on Traceloop / OpenLLMetry.
- Native OpenAI + Anthropic instrumentation (sync + async + streaming)
via
sdk.instrument(client). - New manual API:
sdk.record_llm_call(...)andsdk.add_redaction_pattern(name, regex). LLMEventwire format addskind,prompt_version,input_cost_usd,output_cost_usd,finish_reason,cache_read_tokens,tool_calls, and others to match the TS SDK parity contract.
Self-hosted setup
See the main repo README for the full Docker Compose quickstart that brings up the Rust ingest, Rust processor, ClickHouse, Postgres, Redpanda, Go API, and Next.js dashboard.
License
BUSL-1.1 — free for any internal use; not for resale as a managed service. Converts to Apache 2.0 on May 26, 2031.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scopecall_py-0.2.0.tar.gz.
File metadata
- Download URL: scopecall_py-0.2.0.tar.gz
- Upload date:
- Size: 52.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c64463317250adb681fe9f2ff77f2c9e032b38c63e693621d08eb753d6a674f
|
|
| MD5 |
bb8b6b83c25725daf1cc8e890661dec1
|
|
| BLAKE2b-256 |
67725b6c6a56f74385c7a620d6285d13a9b0d33c8440ea4a39127e364cc22296
|
Provenance
The following attestation bundles were made for scopecall_py-0.2.0.tar.gz:
Publisher:
publish-python.yml on scopecall/scopecall
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scopecall_py-0.2.0.tar.gz -
Subject digest:
1c64463317250adb681fe9f2ff77f2c9e032b38c63e693621d08eb753d6a674f - Sigstore transparency entry: 1710508814
- Sigstore integration time:
-
Permalink:
scopecall/scopecall@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b -
Branch / Tag:
refs/tags/python-v0.2.0 - Owner: https://github.com/scopecall
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python.yml@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b -
Trigger Event:
push
-
Statement type:
File details
Details for the file scopecall_py-0.2.0-py3-none-any.whl.
File metadata
- Download URL: scopecall_py-0.2.0-py3-none-any.whl
- Upload date:
- Size: 45.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32468774b4b5bf95df0d8d6873c12ba8fc3355ee8f0c5747e2b2e13e6716f6dd
|
|
| MD5 |
9fd7eece479811a04e01decf5bc68aad
|
|
| BLAKE2b-256 |
e8a63d1d4fd89ee3056a245c7f0cbb18f131804dc62a8c8c6c268efc97532313
|
Provenance
The following attestation bundles were made for scopecall_py-0.2.0-py3-none-any.whl:
Publisher:
publish-python.yml on scopecall/scopecall
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scopecall_py-0.2.0-py3-none-any.whl -
Subject digest:
32468774b4b5bf95df0d8d6873c12ba8fc3355ee8f0c5747e2b2e13e6716f6dd - Sigstore transparency entry: 1710508825
- Sigstore integration time:
-
Permalink:
scopecall/scopecall@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b -
Branch / Tag:
refs/tags/python-v0.2.0 - Owner: https://github.com/scopecall
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python.yml@6e3c58a6d3bf8bde4d00fe3d209f286044cc6b9b -
Trigger Event:
push
-
Statement type: