SpanForge — AI lifecycle and governance platform (RFC-0001 SPANFORGE)

These details have not been verified by PyPI

Project links

Project description

spanforge

The AI Compliance Platform for Agentic Systems.
Ship AI applications that are auditable, regulator-ready, and privacy-safe — from day one.

Built on RFC-0001 — the SpanForge AI Compliance Standard for agentic AI systems.

Python 3.9+ 92% test coverage 3331 tests Version 2.1.0 Zero dependencies MIT license

The problem

You're building AI applications in a world where regulators are catching up fast. The EU AI Act is in force. GDPR applies to every LLM that touches personal data. SOC 2 auditors want evidence that your AI systems are governed. And your team is stitching together ad-hoc logs, hoping they'll hold up in an audit.

spanforge solves this. It is a compliance-first platform — not a monitoring add-on — that gives every AI action in your stack a cryptographically signed, privacy-safe, regulator-ready record.

What spanforge does

Compliance & Regulatory Mapping

Map telemetry to EU AI Act, GDPR, SOC 2, HIPAA, ISO 42001, NIST AI RMF clauses automatically
Generate HMAC-signed evidence packages with gap analysis
Track consent boundaries, HITL oversight, model registry governance, and explainability coverage
Produce audit-ready attestations with model owner, risk tier, and status metadata

Privacy & Audit Infrastructure

PII redaction — detect and strip sensitive data before it leaves your app
HMAC audit chains — tamper-evident, blockchain-style event signing
GDPR subject erasure — right-to-erasure with tombstone events that preserve chain integrity
Air-gapped deployment — runs fully offline with zero egress

Governance & Controls

Consent boundary monitoring — consent.granted, consent.revoked, consent.violation events
Human-in-the-loop hooks — hitl.queued, hitl.reviewed, hitl.escalated, hitl.timeout events
Model registry — register, deprecate, retire models; attestations auto-warn on ungoverned models
Explainability tracking — measure what % of AI decisions have explanations attached

Developer Experience

Zero required dependencies — pure Python 3.9+ stdlib
One-line setup — spanforge.configure() and you're compliant
Auto-instrumentation — patch OpenAI, Anthropic, LangChain, CrewAI, and more
18 CLI commands — compliance checks, PII scans, audit-chain verification, all CI-ready

How it compares

spanforge is the only open-standard, zero-dependency AI compliance platform. Other tools are monitoring platforms that bolt on compliance as an afterthought. spanforge is compliance infrastructure that happens to capture the telemetry needed to prove it.

Capability	spanforge	LangSmith	Langfuse	OpenLLMetry	Arize Phoenix
Regulatory framework mapping (EU AI Act, GDPR, SOC 2…)	✅	❌	❌	❌	❌
HMAC-signed evidence packages & attestations	✅	❌	❌	❌	❌
Consent boundary monitoring	✅	❌	❌	❌	❌
Human-in-the-loop compliance events	✅	❌	❌	❌	❌
Model registry with risk-tier governance	✅	❌	❌	❌	❌
Explainability coverage metrics	✅	❌	❌	❌	❌
Built-in PII redaction	✅	❌	❌	❌	❌
Tamper-proof audit chain	✅	❌	❌	❌	❌
GDPR subject erasure (right-to-erasure)	✅	❌	❌	❌	❌
Works fully offline / air-gapped	✅	❌	Self-host	Partial	Self-host
Open schema standard (RFC-driven)	✅	❌	❌	Partial	❌
Zero required dependencies	✅	❌	❌	❌	❌
OTLP export (any OTel backend)	✅	❌	❌	✅	✅
MIT license, no call-home	✅	Partial	✅	✅	✅

Bottom line: Others help you watch your AI. spanforge helps you govern it.

Install

pip install spanforge

Requires Python 3.9+. Zero mandatory dependencies.

Optional extras

pip install "spanforge[openai]"       # OpenAI auto-instrumentation
pip install "spanforge[langchain]"    # LangChain callback handler
pip install "spanforge[crewai]"       # CrewAI callback handler
pip install "spanforge[http]"         # Webhook + OTLP export
pip install "spanforge[datadog]"      # Datadog APM + metrics
pip install "spanforge[kafka]"        # Kafka EventStream source
pip install "spanforge[pydantic]"     # Pydantic v2 model layer
pip install "spanforge[otel]"         # OpenTelemetry SDK integration
pip install "spanforge[jsonschema]"   # Strict JSON Schema validation
pip install "spanforge[llamaindex]"   # LlamaIndex event handler
pip install "spanforge[all]"          # everything above

Quick start — compliance in 5 minutes

1. Configure and instrument

import spanforge

spanforge.configure(
    service_name="my-agent",
    signing_key="your-org-secret",      # HMAC audit chain — tamper-proof
    redaction_policy="gdpr",            # PII stripped before export
    exporter="jsonl",
    endpoint="audit.jsonl",
)

Every event your app emits is now signed, PII-redacted, and stored — with zero per-call boilerplate.

2. Trace AI decisions

with spanforge.start_trace("loan-approval-agent") as trace:
    with trace.llm_call("gpt-4o", temperature=0.2) as span:
        decision = call_llm(prompt)
        span.set_token_usage(input=512, output=200, total=712)
        span.set_status("ok")

3. Generate compliance evidence

from spanforge.core.compliance_mapping import ComplianceMappingEngine

engine = ComplianceMappingEngine()
package = engine.generate_evidence_package(
    model_id="gpt-4o",
    framework="eu_ai_act",
    from_date="2026-01-01",
    to_date="2026-03-31",
    audit_events=events,
)

print(package.attestation.coverage_pct)            # e.g. 87.5%
print(package.attestation.explanation_coverage_pct) # e.g. 75.0%
print(package.attestation.model_risk_tier)          # e.g. "high"
print(package.gap_report)                           # what's missing

Or from the CLI:

spanforge compliance generate \
  --model gpt-4o \
  --framework eu_ai_act \
  --from 2026-01-01 --to 2026-03-31 \
  audit.jsonl

4. Hand to your auditor

The evidence package contains:

Clause mappings — which telemetry events satisfy which regulatory clauses
Gap analysis — which clauses lack evidence and need attention
HMAC-signed attestation — cryptographic proof the evidence hasn't been tampered with
Model governance metadata — owner, risk tier, status, warnings for deprecated/retired models
Explanation coverage — percentage of AI decisions with explainability records

Regulatory framework coverage

The ComplianceMappingEngine maps your telemetry events to specific regulatory clauses:

Framework	Clause	Mapped events	What it proves
GDPR	Art. 22	`consent.`, `hitl.`	Automated decisions have consent + human oversight
GDPR	Art. 25	`llm.redact.`, `consent.`	Privacy by design — PII handled before export
EU AI Act	Art. 13	`explanation.*`	AI decisions are transparent and explainable
EU AI Act	Art. 14	`hitl.`, `consent.`	Human oversight of high-risk AI
EU AI Act	Annex IV.5	`llm.guard.`, `llm.audit.`, `hitl.*`	Technical documentation — safety + oversight
SOC 2	CC6.1	`llm.audit.`, `llm.trace.`, `model_registry.*`	Logical access controls + model governance
NIST AI RMF	MAP 1.1	`llm.trace.`, `llm.eval.`, `model_registry.`, `explanation.`	Risk identification and mapping
HIPAA	§164.312	`llm.redact.`, `llm.audit.`	PHI access controls and audit
ISO 42001	A.5–A.10	Full event set	AI management system controls

Compliance event types

spanforge defines purpose-built event types for AI governance — these aren't afterthought log messages, they are first-class compliance primitives:

Category	Event types	Purpose
Consent	`consent.granted`, `consent.revoked`, `consent.violation`	Track user consent for automated processing
Human-in-the-Loop	`hitl.queued`, `hitl.reviewed`, `hitl.escalated`, `hitl.timeout`	Prove human oversight of AI decisions
Model Registry	`model_registry.registered`, `model_registry.deprecated`, `model_registry.retired`	Govern model lifecycle and risk
Explainability	`explanation.generated`	Attach explanations to AI decisions
Guardrails	`llm.guard.*`	Safety classifier outputs and block decisions
PII	`llm.redact.*`	Audit trail of what PII was found and removed
Audit	`llm.audit.*`	Access logs and chain-of-custody records
Traces	`llm.trace.*`	Model calls, tokens, latency, cost

Core capabilities

Tamper-proof audit chains

Every event is HMAC-SHA256 signed and chained to its predecessor — the same principle as certificate chains. Alter one event and the entire chain breaks.

from spanforge.signing import AuditStream, verify_chain

stream = AuditStream(org_secret="your-secret")
for event in events:
    stream.append(event)

result = verify_chain(stream.events, org_secret="your-secret")
assert result.valid  # any tampering → False

PII redaction

Strip personal data before events leave your application boundary. Deep scanning with Luhn and Verhoeff validation for credit cards and Aadhaar numbers, SSN range validation (_is_valid_ssn), calendar validation for dates of birth (_is_valid_date), and built-in patterns for date_of_birth and street address.

from spanforge.redact import RedactionPolicy, Sensitivity

policy = RedactionPolicy(min_sensitivity=Sensitivity.PII, redacted_by="policy:gdpr-v1")
result = policy.apply(event)
# All PII fields → "[REDACTED by policy:gdpr-v1]"

Model registry governance

Register models with ownership and risk metadata. Attestations automatically warn when models are deprecated, retired, or unregistered.

from spanforge.model_registry import ModelRegistry

registry = ModelRegistry()
registry.register("gpt-4o", owner="ml-platform", risk_tier="high")
registry.deprecate("gpt-3.5-turbo", reason="Successor available")

# Evidence packages now include:
#   model_owner: "ml-platform"
#   model_risk_tier: "high"
#   model_status: "active"
#   model_warnings: []  (or ["model 'gpt-3.5-turbo' is deprecated"])

Explainability tracking

Measure what percentage of your AI decisions have explanations attached:

from spanforge.explain import generate_explanation

explanation = generate_explanation(
    decision_event_id="evt_01HX...",
    method="feature_importance",
    content="Top factors: credit_score (0.42), income (0.31)...",
)
# explanation_coverage_pct in attestations = explained / total decisions

GDPR subject erasure

Right-to-erasure with tombstone events that preserve audit chain integrity:

spanforge audit erase audit.jsonl --subject-id user123

Auto-instrumentation

Patch supported providers once — compliance data flows automatically:

# Instrument all installed providers in one call
import spanforge.auto
spanforge.auto.setup()

# Or patch individually
from spanforge.integrations import openai as sf_openai
sf_openai.patch()    # every OpenAI call → signed, redacted, compliant
sf_openai.unpatch()  # restore original behaviour

Supported providers: OpenAI, Anthropic, Ollama, Groq, Together AI

Supported frameworks: LangChain, LlamaIndex, CrewAI

Using spanforge alongside OpenTelemetry

spanforge is not an OTel replacement. OTel handles performance monitoring. spanforge adds the compliance layer OTel cannot provide — audit chains, PII redaction, consent tracking, and regulator-ready attestations.

# Your existing OTel pipeline stays untouched
from opentelemetry.sdk.trace import TracerProvider
provider = TracerProvider()

# Add spanforge's compliance layer alongside it
import spanforge
spanforge.configure(mode="otel_passthrough")

# Dual-stream: OTel for monitoring, spanforge for compliance
spanforge.configure(exporters=["otel_passthrough", "jsonl"], endpoint="audit.jsonl")

Export

Ship compliance events to any backend:

from spanforge.stream import EventStream
from spanforge.export.jsonl import JSONLExporter
from spanforge.export.otlp import OTLPExporter
from spanforge.export.datadog import DatadogExporter
from spanforge.export.grafana import GrafanaLokiExporter
from spanforge.export.cloud import CloudExporter

stream = EventStream(events)

await stream.drain(JSONLExporter("audit.jsonl"))                    # local file
await stream.drain(OTLPExporter("http://collector:4318/v1/traces")) # OTel collector
await stream.drain(DatadogExporter(service="my-app"))               # Datadog APM
await stream.drain(GrafanaLokiExporter(url="http://loki:3100"))     # Grafana Loki
await stream.drain(CloudExporter(api_key="sf_live_xxx"))            # spanforge Cloud

Fan-out routing for compliance alerting:

from spanforge.export.webhook import WebhookExporter

# Route guardrail violations to Slack
await stream.route(
    WebhookExporter("https://hooks.slack.com/your-webhook"),
    predicate=lambda e: e.event_type == "llm.guard.output.blocked",
)

CLI

18 commands — all CI-pipeline ready:

# Compliance
spanforge compliance generate --model gpt-4o --framework eu_ai_act \
  --from 2026-01-01 --to 2026-03-31 events.jsonl
spanforge compliance check evidence.json
spanforge compliance validate-attestation evidence.json

# Audit chain
spanforge audit-chain events.jsonl             # verify chain integrity
spanforge audit erase events.jsonl --subject-id user123  # GDPR erasure
spanforge audit rotate-key events.jsonl        # key rotation
spanforge audit verify --input events.jsonl    # verify integrity

# Privacy
spanforge scan events.jsonl --fail-on-match    # CI-gate PII scan

# Validation
spanforge check                                # end-to-end health check
spanforge check-compat events.json             # v2.0 compatibility
spanforge validate events.jsonl                # JSON Schema validation

# Analysis
spanforge stats events.jsonl                   # counts, tokens, cost
spanforge inspect <EVENT_ID> events.jsonl      # pretty-print one event
spanforge cost events.jsonl                    # token spend report

# Schema management
spanforge migrate events.jsonl --sign          # v1→v2 migration
spanforge list-deprecated                      # deprecated event types
spanforge migration-roadmap                    # v2 migration plan
spanforge check-consumers                      # consumer compatibility

# Viewer
spanforge serve                                # local SPA trace viewer
spanforge ui                                   # standalone HTML viewer

Event namespaces

Every event carries a typed payload. The built-in namespaces:

Prefix	Dataclass	What it records
`consent.*`	—	User consent grants, revocations, violations
`hitl.*`	—	Human-in-the-loop review, escalation, timeout
`model_registry.*`	—	Model registration, deprecation, retirement
`explanation.*`	—	Explainability records for AI decisions
`llm.trace.*`	`SpanPayload`	Model calls — tokens, latency, cost (frozen v2)
`llm.guard.*`	`GuardPayload`	Safety classifier outputs, block decisions
`llm.redact.*`	`RedactPayload`	PII audit — what was found and removed
`llm.audit.*`	—	Access logs and chain-of-custody
`llm.eval.*`	`EvalScenarioPayload`	Scores, labels, evaluator identity
`llm.cost.*`	`CostPayload`	Per-call cost in USD
`llm.cache.*`	`CachePayload`	Cache hit/miss, backend, TTL
`llm.prompt.*`	`PromptPayload`	Prompt template version, rendered text
`llm.fence.*`	`FencePayload`	Topic constraints, allow/block lists
`llm.diff.*`	`DiffPayload`	Prompt/response delta between events
`llm.template.*`	`TemplatePayload`	Template registry metadata

Architecture

spanforge/
├── core/
│   └── compliance_mapping.py  ← ComplianceMappingEngine, evidence packages, attestations
├── compliance/                ← Programmatic compliance test suite
├── signing.py                 ← HMAC audit chains, key management, multi-tenant KeyResolver
├── redact.py                  ← PII detection + redaction policies
├── model_registry.py          ← Model lifecycle governance
├── explain.py                 ← Explainability records
├── consent.py                 ← Consent boundary events
├── hitl.py                    ← Human-in-the-loop events
├── governance.py              ← Policy-based event gating
├── event.py                   ← Event envelope
├── types.py                   ← EventType enum (consent.*, hitl.*, model_registry.*, explanation.*, llm.*)
├── config.py                  ← configure() / get_config()
├── _span.py                   ← Span, AgentRun, AgentStep context managers
├── _trace.py                  ← Trace + start_trace()
├── _tracer.py                 ← Top-level tracing entry point
├── _stream.py                 ← Internal dispatch: sample → redact → sign → export
├── _store.py                  ← TraceStore ring buffer
├── _hooks.py                  ← HookRegistry (lifecycle hooks)
├── _server.py                 ← HTTP server (/traces, /compliance/summary)
├── _cli.py                    ← 18 CLI sub-commands
├── cost.py                    ← CostTracker, BudgetMonitor, @budget_alert
├── cache.py                   ← SemanticCache, @cached decorator
├── retry.py                   ← @retry, FallbackChain, CircuitBreaker
├── toolsmith.py               ← @tool, ToolRegistry
├── lint/                      ← AST-based instrumentation linter (AO001–AO005)
├── export/                    ← JSONL, OTLP, Webhook, Datadog, Grafana Loki, Cloud
├── integrations/              ← OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Ollama, Groq, Together
├── namespaces/                ← Typed payload dataclasses
└── migrate.py                 ← Schema migration (v1 → v2)

What is inside the box

Module	What it does	For whom
Compliance & Governance
`spanforge.compliance`	`ComplianceMappingEngine` maps telemetry to regulatory frameworks (EU AI Act, ISO 42001, NIST AI RMF, GDPR, SOC 2, HIPAA). Generates evidence packages with HMAC-signed attestations. Consent, HITL, model registry, and explainability events integrated into clause mappings. Attestations include model owner, risk tier, status, warnings, and `explanation_coverage_pct`. Also: programmatic v2.0 compatibility checks — no pytest required.	Compliance / legal / platform teams
`spanforge.signing`	HMAC-SHA256 event signing, tamper-evident audit chains, key strength validation, key expiry checks, environment-isolated key derivation, multi-tenant `KeyResolver` protocol, and `AsyncAuditStream`	Security / compliance teams
`spanforge.redact`	PII detection, sensitivity levels, redaction policies, deep `scan_payload()` with Luhn / Verhoeff / SSN-range / date-calendar validation, built-in `date_of_birth` and `address` patterns, and `contains_pii()` / `assert_redacted()` with raw string scanning	Data privacy / GDPR teams
`spanforge.governance`	Policy-based event gating — block prohibited types, warn on deprecated usage, enforce custom rules	Platform / compliance teams
Instrumentation & Tracing
`spanforge.event`	The core `Event` envelope — the one structure all tools share	Everyone
`spanforge.types`	All built-in event types — compliance events (`consent.`, `hitl.`, `model_registry.`, `explanation.`) and telemetry events (`llm.trace.`, `llm.guard.`, etc.)	Everyone
`spanforge._span`	Span, AgentRun, AgentStep context managers. `contextvars`-based async/thread-safe propagation. `async with`, `span.add_event()`, `span.set_timeout_deadline()`	App developers
`spanforge._trace`	`Trace` + `start_trace()` — high-level tracing entry point; accumulates child spans	App developers
`spanforge.config`	`configure()` and `get_config()` — signing key, redaction policy, exporters, sample rate	Everyone
Export & Integration
`spanforge.export`	Ship events to JSONL, HTTP webhooks, OTLP collectors, Datadog APM, Grafana Loki, or spanforge Cloud	Infra / compliance teams
`spanforge.stream`	Fan-out router — one `drain()` call reaches multiple backends; Kafka source	Platform engineers
`spanforge.integrations`	Auto-instrumentation for OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Groq, Ollama, Together	App developers
`spanforge.auto`	`setup()` auto-patches all installed LLM integrations; `teardown()` cleanly unpatches	App developers
Developer Tools
`spanforge.cost`	`CostTracker`, `BudgetMonitor`, `@budget_alert` — track and alert on token spend	App developers / FinOps
`spanforge.cache`	`SemanticCache` + `@cached` — deduplicate LLM calls via cosine similarity; `InMemoryBackend`, `SQLiteBackend`, `RedisBackend`	App developers / FinOps
`spanforge.retry`	`@retry`, `FallbackChain`, `CircuitBreaker`, `CostAwareRouter` — resilient LLM routing with compliance events	App developers / SREs
`spanforge.toolsmith`	`@tool` + `ToolRegistry` — register functions as typed tools; render JSON schemas for function-calling APIs	App developers
`spanforge.lint`	AST-based instrumentation linter; AO001–AO005 codes; flake8 plugin; CLI	All teams / CI
`spanforge.debug`	`print_tree()`, `summary()`, `visualize()` — terminal tree, stats dict, HTML Gantt timeline	App developers
`spanforge.metrics`	`aggregate()` — success rates, latency percentiles, token totals, cost breakdowns	Analytics engineers
`spanforge.testing`	`MockExporter`, `capture_events()`, `assert_event_schema_valid()`, `trace_store()`	Test authors
`spanforge.validate`	JSON Schema validation against the published v2.0 schema	All teams
`spanforge.namespaces`	Typed payload dataclasses for all built-in event namespaces	Tool authors
`spanforge.models`	Optional Pydantic v2 models for validated schemas	API / backend teams
`spanforge.consumer`	Declare schema-namespace dependencies; fail fast at startup if version requirements are not met	Platform teams
`spanforge.deprecations`	Per-event-type deprecation notices at runtime	Library maintainers
`spanforge._hooks`	Lifecycle hooks: `@hooks.on_llm_call`, `@hooks.on_tool_call`, `@hooks.on_agent_start` (sync + async)	App developers / platform
`spanforge._store`	`TraceStore` ring buffer — `get_trace()`, `list_tool_calls()`, `list_llm_calls()`	Platform / tooling engineers
`spanforge._cli`	18 CLI sub-commands: compliance, audit, scan, validate, stats, serve, ui, and more	DevOps / CI teams

Quality

3 331 tests passing (10 skipped) — unit, integration, property-based (Hypothesis), performance benchmarks
≥ 92 % line and branch coverage — 90 % minimum enforced in CI
Zero required dependencies — entire core runs on Python stdlib
Typed — full py.typed marker; mypy + pyright clean
Frozen v2 trace schema — llm.trace.* payload fields never break between minor releases
Async-safe — contextvars-based context propagation across asyncio, threads, and executors

Development

git clone https://github.com/veerarag1973/spanforge.git
cd spanforge
python -m venv .venv && .venv\Scripts\activate
pip install -e ".[dev]"
pytest                      # 3 331 tests

Code quality

ruff check . && ruff format .
mypy spanforge
pytest --cov                # >=90% required

Build docs

pip install -e ".[docs]"
cd docs && sphinx-build -b html . _build/html

Versioning

spanforge implements RFC-0001 (AI Compliance Standard for Agentic AI Systems). Current schema version: 2.0.

This project follows Semantic Versioning. The llm.trace.* namespace is additionally frozen at v2 — even major releases won't remove fields from SpanPayload, AgentRunPayload, or AgentStepPayload.

See docs/changelog.md for the full version history.

Contributing

Contributions welcome — see the Contributing Guide. All new code must maintain ≥ 90 % coverage. Run ruff and mypy before submitting.

Community

Discussions — questions, ideas, show-and-tell
Issues — bug reports and feature requests
SECURITY.md — responsible disclosure process
Code of Conduct — Contributor Covenant v2.1

Topics: ai-compliance ai-governance eu-ai-act gdpr soc2 audit-trail pii-redaction hmac-signing llm-governance python

License

MIT — free for personal and commercial use.

Built for teams that take AI governance seriously.
Docs · Quickstart · API Reference · Discussions · Report a bug

spanforge

The reference implementation of the spanforge Standard.
A lightweight Python SDK that gives your AI applications a common, structured way to record, sign, redact, and export events — with zero mandatory dependencies.

spanforge (RFC-0001) is the open event-schema standard for compliance and governance of agentic AI systems.

Python 3.9+ 92% test coverage 3162 tests Version 1.0.0 Zero dependencies MIT license

What is this?

spanforge (spanforge) is the reference implementation of RFC-0001 spanforge — the open event-schema standard for compliance and governance of agentic AI systems.

spanforge defines a structured, typed event envelope that every LLM-adjacent instrumentation tool can emit and every compliance backend can consume. It covers the full lifecycle: event envelopes, agent span hierarchies, token and cost models, HMAC audit chains, PII redaction, OTLP-compatible export, and schema governance.

Think of spanforge as a universal receipt format for your AI application. Every time your app calls a language model, makes a decision, redacts private data, or checks a guardrail — this library gives that action a consistent, structured record that any tool in your stack can read.

Why use it?

Without a shared schema, every team invents their own log format. With spanforge (and the spanforge standard it implements), your logs, dashboards, compliance reports, and monitoring tools all speak the same language — automatically.

Without spanforge	With spanforge
Each service logs events differently	Every event follows the same structure
Hard to audit who saw what data	Built-in HMAC signing creates a tamper-proof audit trail
PII scattered across logs	First-class PII redaction before data leaves your app
Vendor-specific telemetry	OpenTelemetry-compatible — works with any monitoring stack
No way to check compatibility	CLI + programmatic compliance checks in CI
Complex integration glue	Zero required dependencies — just `pip install`

How spanforge compares

spanforge is the only open-schema, zero-dependency AI compliance platform. Everything else either requires a hosted backend, imposes a proprietary event format, or has mandatory heavy dependencies.

Feature	spanforge	LangSmith	Langfuse	OpenLLMetry	Arize Phoenix
Open schema standard (RFC-driven)	✅	❌	❌	Partial	❌
Zero required dependencies	✅	❌	❌	❌	❌
Works fully offline / air-gapped	✅	❌	Self-host only	Partial	Self-host only
HMAC tamper-proof audit chain	✅	❌	❌	❌	❌
First-class PII redaction (built-in)	✅	❌	❌	❌	❌
OTLP export (any OTel backend)	✅	❌	❌	✅	✅
MIT license (self-hosted, no call-home)	✅	Partial	✅	✅	✅
Python 3.9+ (no Pydantic required)	✅	❌	❌	❌	❌
CLI-first compliance checks	✅	❌	❌	❌	❌
Schema versioning & migration tools	✅	❌	❌	❌	❌

Bottom line: Use spanforge when you need a standard rather than a service — especially in regulated, offline, or multi-vendor environments.

Install

pip install spanforge

import spanforge  # distribution name is spanforge, import name is spanforge

Requires Python 3.9 or later. No other packages are required for core usage.

Note: The PyPI distribution is named spanforge. The Python import name remains spanforge.

Optional extras

pip install "spanforge[jsonschema]"   # strict JSON Schema validation
pip install "spanforge[openai]"       # OpenAI auto-instrumentation (patch/unpatch)
pip install "spanforge[http]"         # Webhook + OTLP export
pip install "spanforge[pydantic]"     # Pydantic v2 model layer
pip install "spanforge[otel]"         # OpenTelemetry SDK integration
pip install "spanforge[kafka]"        # EventStream.from_kafka() via kafka-python
pip install "spanforge[langchain]"    # LangChain callback handler
pip install "spanforge[llamaindex]"   # LlamaIndex event handler
pip install "spanforge[crewai]"       # CrewAI callback handler
pip install "spanforge[datadog]"      # Datadog APM + metrics exporter
pip install "spanforge[all]"          # everything above

Using SpanForge alongside OpenTelemetry

SpanForge does not replace your OTel setup. It adds the compliance layer OTel cannot provide — tamper-proof audit chains, PII redaction, and regulator-ready attestation reports.

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

# 1. Set up your existing OTel pipeline as normal
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))

# 2. Add SpanForge's compliance layer in one line
import spanforge
spanforge.configure(mode="otel_passthrough")

# 3. Use SpanForge spans — OTel + HMAC audit chain + PII redaction all active
with spanforge.Tracer().span("retrieve_docs") as s:
    s.set_attribute("user_query", "What is our refund policy?")

For dual-stream export (OTel bridge + local audit log):

spanforge.configure(exporters=["otel_passthrough", "jsonl"], endpoint="audit.jsonl")

Five-minute tour

1 — Trace an LLM call with the span API

import spanforge

spanforge.configure(exporter="console", service_name="my-agent")

with spanforge.span("call-llm") as span:
    span.set_model(model="gpt-4o", system="openai")
    result = call_llm(prompt)                          # your LLM call here
    span.set_token_usage(input=512, output=128, total=640)
    span.set_status("ok")

The context manager automatically records start/end times, parent-child span relationships, and emits a structured event when it exits.

1c — Use the high-level `Trace` API (new in 2.0)

import spanforge

spanforge.configure(exporter="console", service_name="my-agent")

with spanforge.start_trace("research-agent") as trace:
    with trace.llm_call("gpt-4o", temperature=0.7) as span:
        result = call_llm(prompt)
        span.set_token_usage(input=512, output=200, total=712)
        span.set_status("ok")
        span.add_event("tool_selected", {"name": "web_search"})

    with trace.tool_call("web_search") as span:
        output = run_search(query)
        span.set_status("ok")

# Inspect the trace in the terminal
trace.print_tree()
# ─ Agent Run: research-agent  [1.2s]
#  ├─ LLM Call: gpt-4o  [0.8s]  in=512 out=200 tokens  $0.0034
#  └─ Tool Call: web_search  [0.4s]  ok

print(trace.summary())
# {'trace_id': '...', 'agent_name': 'research-agent', 'span_count': 3, ...}

The Trace object works with async with too:

async with spanforge.start_trace("async-agent") as trace:
    async with trace.llm_call("gpt-4o") as span:
        response = await async_call_llm(prompt)
        span.set_status("ok")

1b — Auto-instrument the OpenAI client (zero boilerplate)

from spanforge.integrations import openai as openai_integration
import openai, spanforge

# One-time setup: patch the OpenAI SDK
openai_integration.patch()

spanforge.configure(exporter="console", service_name="my-agent")

client = openai.OpenAI()

with spanforge.tracer.span("chat-gpt4o") as span:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
    )
    # span.token_usage, span.cost, and span.model are now populated automatically

patch() wraps every client.chat.completions.create() call (sync and async) so that token_usage, cost, and model are auto-populated on the active span from the API response — no per-call boilerplate required.

# Restore original behaviour when you're done
openai_integration.unpatch()

2 — Record a raw event

from spanforge import Event, EventType, Tags

event = Event(
    event_type=EventType.TRACE_SPAN_COMPLETED,
    source="my-app@1.0.0",          # who emitted this
    org_id="org_acme",              # your organisation
    payload={
        "model": "gpt-4o",
        "prompt_tokens": 512,
        "completion_tokens": 128,
        "latency_ms": 340.5,
    },
    tags=Tags(env="production"),
)

event.validate()         # raises if structure is invalid
print(event.to_json())   # compact JSON string, ready to store or ship

Every event gets a ULID (a time-sortable unique ID) automatically — no need to generate one yourself.

3 — Redact private information before logging

from spanforge import Event, EventType
from spanforge.redact import Redactable, RedactionPolicy, Sensitivity

policy = RedactionPolicy(min_sensitivity=Sensitivity.PII, redacted_by="policy:gdpr-v1")

# Wrap any string that might contain PII
event = Event(
    event_type=EventType.TRACE_SPAN_COMPLETED,
    source="my-app@1.0.0",
    payload={"prompt": Redactable("Call me at 555-867-5309", Sensitivity.PII)},
)
result = policy.apply(event)
# result.event.payload["prompt"] -> "[REDACTED by policy:gdpr-v1]"

Redactable is a string wrapper. You mark fields as sensitive at the point where they are created; the policy decides what to remove before the event is written to any log.

Tip — auto-redact every span: pass redaction_policy=policy to spanforge.configure() and the policy runs automatically inside _dispatch() before any exporter sees the event.

4 — Sign events for tamper-proof audit trails

from spanforge.signing import sign, verify_chain, AuditStream

# Sign a single event
signed = sign(event, org_secret="my-org-secret")

# Or build a chain — every event references the one before it,
# so any gap or modification is immediately detectable.
stream = AuditStream(org_secret="my-org-secret")
for e in events:
    stream.append(e)

result = verify_chain(stream.events, org_secret="my-org-secret")

This is the same principle used in certificate chains and blockchain — each event's signature covers the previous event's signature, so you cannot alter history without breaking the chain.

Tip — auto-sign every span: pass signing_key="your-secret" to spanforge.configure() and every emitted span is signed and chained automatically, with no per-event boilerplate.

5 — Export to anywhere

from spanforge.stream import EventStream
from spanforge.export.jsonl import JSONLExporter
from spanforge.export.webhook import WebhookExporter
from spanforge.export.otlp import OTLPExporter
from spanforge.export.datadog import DatadogExporter
from spanforge.export.grafana import GrafanaLokiExporter
from spanforge.export.cloud import CloudExporter

stream = EventStream(events)

# Write everything to a local file
await stream.drain(JSONLExporter("events.jsonl"))

# Ship to your OpenTelemetry collector
await stream.drain(OTLPExporter("http://otel-collector:4318/v1/traces"))

# Send to Datadog APM (traces + metrics)
await stream.drain(DatadogExporter(
    service="my-app",
    env="production",
    agent_url="http://dd-agent:8126",
    api_key="your-dd-api-key",
))

# Push to Grafana Loki
await stream.drain(GrafanaLokiExporter(
    url="http://loki:3100",
    labels={"app": "my-app", "env": "production"},
))

# Send to spanforge Cloud
await stream.drain(CloudExporter(
    api_key="sf_live_xxx",
    endpoint="https://ingest.getspanforge.com/v1/events",
))

# Fan-out: guard-blocked events -> Slack webhook
await stream.route(
    WebhookExporter("https://hooks.slack.com/your-webhook"),
  predicate=lambda e: e.event_type == "llm.guard.output.blocked",
)

Kafka source

from spanforge.stream import EventStream

# Drain a Kafka topic directly into an EventStream
stream = EventStream.from_kafka(
    topic="llm-events",
    bootstrap_servers="kafka:9092",
    group_id="analytics",
    max_messages=5000,
)
await stream.drain(exporter)

6 — Sync exporters for non-async workflows

from spanforge.exporters.jsonl import SyncJSONLExporter
from spanforge.exporters.console import SyncConsoleExporter

# Log all events to a JSONL file synchronously
exporter = SyncJSONLExporter("events.jsonl")
exporter.export(event)
exporter.close()

# Pretty-print events to the terminal during development
console = SyncConsoleExporter()
console.export(event)

7b — Register lifecycle hooks (new in 2.0)

import spanforge

@spanforge.hooks.on_llm_call
def log_llm(span):
    print(f"LLM called: {span.model}  temp={span.temperature}")

@spanforge.hooks.on_tool_call
def log_tool(span):
    print(f"Tool called: {span.name}")

# Hooks fire automatically for every span of the matching type

7c — Aggregate metrics from a trace file (new in 2.0)

import spanforge
from spanforge.stream import EventStream

events = list(EventStream.from_file("events.jsonl"))
summary = spanforge.metrics.aggregate(events)

print(f"Traces:  {summary.trace_count}")
print(f"Success: {summary.agent_success_rate:.0%}")
print(f"p95 LLM: {summary.llm_latency_ms.p95:.0f} ms")
print(f"Cost:    ${summary.total_cost_usd:.4f}")

7d — Visualize a Gantt timeline (new in 2.0)

from spanforge.debug import visualize

html = visualize(trace.spans, path="trace.html")
# Opens trace.html in a browser — self-contained, no external deps

8a — Semantic cache — skip redundant LLM calls

from spanforge.cache import SemanticCache, InMemoryBackend

cache = SemanticCache(
    backend=InMemoryBackend(max_size=1024),
    similarity_threshold=0.92,   # cosine similarity cutoff
    ttl_seconds=3600,
    namespace="responses",
    emit_events=True,            # emits llm.cache.hit/miss/written events
)

# Or use the @cached decorator on any async function
from spanforge.cache import cached

@cached(threshold=0.92, ttl=3600, emit_events=True)
async def call_llm(prompt: str) -> str:
    # ... real LLM call only on cache miss
    return response

reply = await call_llm("Summarise the spanforge RFC in one sentence.")
# Second call with a semantically identical prompt → instant cache hit, zero tokens spent
reply2 = await call_llm("Give me a one-sentence summary of the spanforge RFC.")

8b — Lint your instrumentation in CI

from spanforge.lint import run_checks

source = open("myapp/pipeline.py").read()
errors = run_checks(source, filename="myapp/pipeline.py")

for err in errors:
    print(f"{err.filename}:{err.line}:{err.col}: {err.code} {err.message}")
# myapp/pipeline.py:42:12: AO002 actor_id receives a bare str; wrap with Redactable()

Or run the CLI against a whole directory:

python -m spanforge.lint myapp/
# AO001  Event() missing required field 'payload'     myapp/pipeline.py:17
# AO004  LLM call outside tracer span context         myapp/pipeline.py:53
# 2 errors in 1 file.

# Plug into flake8 / ruff automatically (entry-point registered in pyproject.toml):
flake8 myapp/

9 — Check compliance and inspect events from the command line

spanforge check                           # end-to-end health check (config → export → trace store)
spanforge check-compat events.json        # v2.0 compatibility checklist
spanforge validate events.jsonl           # JSON Schema validation per event
spanforge audit-chain events.jsonl        # verify HMAC signing chain integrity
spanforge audit check-health events.jsonl # PII scan + chain + egress health check
spanforge audit rotate-key events.jsonl   # rotate signing key & re-sign chain
spanforge audit erase events.jsonl --subject-id user123  # GDPR subject erasure
spanforge audit verify --input events.jsonl              # verify chain integrity
spanforge scan events.jsonl --fail-on-match              # CI-gate PII scan
spanforge migrate events.jsonl --sign                    # v1→v2 schema migration
spanforge inspect <EVENT_ID> events.jsonl # pretty-print a single event
spanforge stats events.jsonl              # summary: counts, tokens, cost, timestamps
spanforge list-deprecated                 # list all deprecated event types
spanforge migration-roadmap [--json]      # v2 migration roadmap
spanforge check-consumers                 # consumer registry compatibility check

CHK-1  All required fields present          (500 / 500 events)
CHK-2  Event types valid                    (500 / 500 events)
CHK-3  Source identifiers well-formed       (500 / 500 events)
CHK-5  Event IDs are valid ULIDs            (500 / 500 events)
All checks passed.

Drop any of these into your CI pipeline to catch schema drift, signing failures, or schema-breaking migrations before they reach production.

10 — SPA Trace Viewer

Browse traces in a local single-page application — no external dependencies:

# Start the HTTP trace viewer server (default port 8888)
spanforge serve

# Or open the standalone HTML viewer in your default browser
spanforge ui

spanforge serve starts a lightweight HTTP server that exposes a /traces JSON API backed by the in-memory TraceStore. The SPA renders agent runs, LLM calls, tool calls, and timing data in a searchable table.

spanforge ui generates a self-contained HTML file from a JSONL export and opens it directly — useful for sharing trace snapshots offline.

What is inside the box

Module	What it does	For whom
`spanforge.event`	The core `Event` envelope — the one structure all tools share	Everyone
`spanforge.types`	All built-in event type strings (trace, cost, cache, eval, guard…)	Everyone
`spanforge.config`	`configure()` and `get_config()` — global SDK configuration	Everyone
`spanforge._span`	Span, AgentRun, AgentStep context managers — the runtime tracing API. Uses `contextvars` for safe async/thread context propagation. Supports `async with`, `span.add_event()`, `span.set_timeout_deadline()`	App developers
`spanforge._trace`	`Trace` object and `start_trace()` — high-level, imperative tracing entry point; accumulates all child spans	App developers
`spanforge.debug`	`print_tree()`, `summary()`, `visualize()` — terminal tree, stats dict, and self-contained HTML Gantt timeline	App developers
`spanforge.metrics`	`aggregate()` and `MetricsSummary` — compute success rates, latency percentiles, token totals, and cost breakdowns from any `Iterable[Event]`	Data / analytics engineers
`spanforge._store`	`TraceStore` — in-memory ring buffer; `get_trace()`, `list_tool_calls()`, `list_llm_calls()`	Platform / tooling engineers
`spanforge._hooks`	`HookRegistry` / `hooks` — global span lifecycle hooks: `@hooks.on_llm_call`, `@hooks.on_tool_call`, `@hooks.on_agent_start`, `@hooks.on_agent_end`. Async variants: `@hooks.on_llm_call_async`, `@hooks.on_tool_call_async`, `@hooks.on_agent_start_async`, `@hooks.on_agent_end_async` — fired via `asyncio.ensure_future()`.	App developers / platform
`spanforge._cli`	CLI sub-commands: `check`, `check-compat`, `validate`, `audit-chain`, `audit` (`erase`, `rotate-key`, `check-health`, `verify`), `scan`, `migrate`, `inspect`, `stats`, `list-deprecated`, `migration-roadmap`, `check-consumers`, `compliance`, `cost`, `dev`, `module`, `serve`, `init`, `quickstart`, `report`, `ui`	DevOps / CI teams
`spanforge.redact`	PII detection, sensitivity levels, redaction policies, deep `scan_payload()` with Luhn / Verhoeff / SSN-range / date-calendar validation, built-in `date_of_birth` and `address` patterns, and `contains_pii()` / `assert_redacted()` with raw string scanning	Data privacy / GDPR teams
`spanforge.signing`	HMAC-SHA256 event signing, tamper-evident audit chains, key strength validation, key expiry checks, environment-isolated key derivation, multi-tenant `KeyResolver` protocol, and `AsyncAuditStream`	Security / compliance teams
`spanforge.compliance`	Programmatic v2.0 compatibility checks — no pytest required. `ComplianceMappingEngine` maps telemetry to regulatory frameworks (EU AI Act, ISO 42001, NIST AI RMF, GDPR, SOC 2) and generates evidence packages with HMAC-signed attestations. Consent (`consent.`), HITL (`hitl.`), model registry (`model_registry.`), and explainability (`explanation.`) events are integrated into clause mappings. Attestations include model owner/risk-tier/status metadata and `explanation_coverage_pct`	Platform / DevOps / Compliance teams
`spanforge.export`	Ship events to files (JSONL), HTTP webhooks, OTLP collectors, Datadog APM, Grafana Loki, or spanforge Cloud	Infra / compliance teams
`spanforge.exporters`	Sync exporters — `SyncJSONLExporter` and `SyncConsoleExporter` for non-async code	App developers
`spanforge.stream`	Fan-out router — one `drain()` call reaches multiple backends; Kafka source via `from_kafka()`	Platform engineers
`spanforge.validate`	JSON Schema validation against the published v2.0 schema	All teams
`spanforge.consumer`	Declare schema-namespace dependencies; fail fast at startup if version requirements are not met	Platform / integration teams
`spanforge.governance`	Policy-based event gating — block prohibited types, warn on deprecated usage, enforce custom rules	Platform / compliance teams
`spanforge.deprecations`	Register and surface per-event-type deprecation notices at runtime	Library maintainers
`spanforge.testing`	Test utilities: `MockExporter`, `capture_events()` context manager, `assert_event_schema_valid()`, and `trace_store()` isolated store context manager. Write unit tests for your AI pipeline without real exporters.	App developers / test authors
`spanforge.auto`	Integration auto-discovery: `spanforge.auto.setup()` auto-patches every installed LLM integration (OpenAI, Anthropic, Ollama, Groq, Together AI). `setup()` must be called explicitly; `spanforge.auto.teardown()` cleanly unpatches all.	App developers
`spanforge.integrations`	Plug-in adapters for OpenAI (auto-instrumentation via `patch()`), LangChain, LlamaIndex, Anthropic, Groq, Ollama, Together, and CrewAI (`SpanForgeCrewAIHandler` + `patch()`). `spanforge.integrations._pricing` ships a static USD/1M-token pricing table for all current OpenAI models.	App developers
`spanforge.namespaces`	Typed payload dataclasses for all 10 built-in event namespaces	Tool authors
`spanforge.models`	Optional Pydantic v2 models for teams that prefer validated schemas	API / backend teams
`spanforge.trace`	`@trace()` decorator — wraps sync/async functions, auto-emits span start/end events with timing and error capture. `spanforge.export.otlp_bridge` converts spans to OTLP proto dicts.	App developers
`spanforge.cost`	`CostTracker`, `BudgetMonitor`, `@budget_alert`, `emit_cost_event()`, `cost_summary()` — track and alert on token spend across a session	App developers / FinOps
`spanforge.inspect`	`InspectorSession` context manager + `inspect_trace()` — intercept and record tool call arguments, results, latency, and errors within a trace	Platform / debugging
`spanforge.toolsmith`	`@tool` decorator + `ToolRegistry` — register functions as typed tools; `build_openai_schema()` / `build_anthropic_schema()` render JSON schemas for function-calling APIs	App developers
`spanforge.retry`	`@retry` with exponential back-off, `FallbackChain`, `CircuitBreaker`, `CostAwareRouter` — resilient LLM provider routing with compliance events at each step	App developers / SREs
`spanforge.cache`	`SemanticCache` + `@cached` decorator — deduplicate LLM calls via cosine-similarity matching; pluggable backends: `InMemoryBackend`, `SQLiteBackend`, `RedisBackend`; emits `llm.cache.*` events	App developers / FinOps
`spanforge.lint`	`run_checks(source, filename)` — AST-based instrumentation linter; five AO-codes (AO001–AO005); flake8 plugin; `python -m spanforge.lint` CLI	All teams / CI pipelines

Event namespaces

Every event carries a payload — a dictionary whose shape is defined by the event's namespace. The ten built-in namespaces cover everything from raw model traces to safety guardrails:

Namespace prefix	Dataclass	What it records
`llm.trace.*`	`SpanPayload`, `AgentRunPayload`, `AgentStepPayload`	Model call — tokens, latency, finish reason (frozen v2)
`llm.cost.*`	`CostPayload`	Per-call cost in USD
`llm.cache.*`	`CachePayload`	Cache hit/miss, backend, TTL
`llm.eval.*`	`EvalScenarioPayload`	Scores, labels, evaluator identity
`llm.guard.*`	`GuardPayload`	Safety classifier output, block decisions
`llm.fence.*`	`FencePayload`	Topic constraints, allow/block lists
`llm.prompt.*`	`PromptPayload`	Prompt template version, rendered text
`llm.redact.*`	`RedactPayload`	PII audit record — what was found and removed
`llm.diff.*`	`DiffPayload`	Prompt/response delta between two events
`llm.template.*`	`TemplatePayload`	Template registry metadata

from spanforge.namespaces.trace import SpanPayload
from spanforge import Event

payload = SpanPayload(
    span_name="call-llm",
    span_id="abc123",
    trace_id="def456",
    start_time_ns=1_000_000_000,
    end_time_ns=1_340_000_000,
    status="ok",
)

event = Event(
    event_type="llm.trace.span.completed",
    source="my-app@1.0.0",
    payload=payload.to_dict(),
)

Quality standards

3 331 tests (3 331 passing, 10 skipped) — unit, integration, property-based (Hypothesis), and performance benchmarks
≥ 92.84 % line and branch coverage — measured with pytest-cov; 90 % minimum enforced in CI
Zero required dependencies — the entire core runs on Python's standard library alone
Typed — full py.typed marker; works with mypy and pyright out of the box
Frozen v2 trace schema — llm.trace.* payload fields will never break between minor releases
async-safe context propagation — contextvars-based span stacks work correctly across asyncio tasks, thread pools, and executors
Version 1.0.7 adds: @trace() decorator, OTLP bridge, CostTracker / BudgetMonitor, InspectorSession, ToolRegistry / @tool, @retry / FallbackChain / CircuitBreaker, SemanticCache / @cached, and spanforge.lint (AO001–AO005, flake8 plugin, CLI)
Version 2.0.0 adds: Trace / start_trace(), async with, span.add_event(), print_tree() / summary() / visualize(), sampling controls, metrics.aggregate(), TraceStore, HookRegistry, CrewAI integration
Version 1.0.6 adds: spanforge.testing, spanforge.auto, async lifecycle hooks, spanforge check CLI, export retry with back-off, unpatch() / is_patched() for all integrations, frozen payload dataclasses, assert_no_sunset_reached()

Project structure

spanforge/
├── __init__.py       <- Public API surface (start here)
├── event.py          <- The Event envelope
├── types.py          <- EventType enum  (+ SpanErrorCategory)
├── config.py         <- configure() / get_config() / SpanForgeConfig
│                        (sample_rate, always_sample_errors, include_raw_tool_io,
│                         enable_trace_store, trace_store_size)
├── _span.py          <- Span, AgentRun, AgentStep context managers
│                        (contextvars stacks, async with, add_event,
│                         record_error, set_timeout_deadline)
├── _trace.py         <- Trace class + start_trace()          [NEW in 2.0]
├── _tracer.py        <- Tracer — top-level tracing entry point
├── _stream.py        <- Internal dispatch: sample → redact → sign → export
├── _store.py         <- TraceStore ring buffer                [NEW in 2.0]
├── _hooks.py         <- HookRegistry singleton (hooks)        [NEW in 2.0]
├── _cli.py           <- CLI entry-point (18 sub-commands: check, check-compat, validate,
│                        audit-chain, inspect, stats, list-deprecated, migration-roadmap,
│                        check-consumers, compliance, cost, dev, module, serve, init,
│                        quickstart, report, ui)
├── _server.py        <- TraceViewerServer — lightweight HTTP server for /traces endpoint
├── trace.py          <- @trace() decorator + SpanOTLPBridge   [NEW in 1.0.7]
├── cost.py           <- CostTracker, BudgetMonitor, @budget_alert [NEW in 1.0.7]
├── inspect.py        <- InspectorSession, inspect_trace()     [NEW in 1.0.7]
├── toolsmith.py      <- @tool, ToolRegistry, build_openai_schema() [NEW in 1.0.7]
├── retry.py          <- @retry, FallbackChain, CircuitBreaker [NEW in 1.0.7]
├── cache.py          <- SemanticCache, @cached, *Backend      [NEW in 1.0.7]
├── lint/             <- run_checks(), AO001-AO005, flake8 plugin, CLI [NEW in 1.0.7]
│   ├── __init__.py
│   ├── _visitor.py
│   ├── _checks.py
│   ├── _flake8.py
│   └── __main__.py
├── testing.py        <- MockExporter, capture_events(), assert_event_schema_valid(),
│                        trace_store() — test utilities without real exporters [1.0.6]
├── auto.py           <- Integration auto-discovery; setup() / teardown()        [1.0.6]
├── debug.py          <- print_tree, summary, visualize        [NEW in 2.0]
├── metrics.py        <- aggregate(), MetricsSummary, etc.     [NEW in 2.0]
├── signing.py        <- HMAC signing & audit chains
├── redact.py         <- PII redaction
├── validate.py       <- JSON Schema validation
├── consumer.py       <- Consumer registry & schema-version compatibility
├── governance.py     <- Event governance policies
├── deprecations.py   <- Per-event-type deprecation tracking
├── compliance/       <- Compatibility checklist suite
├── core/
│   └── compliance_mapping.py <- ComplianceMappingEngine + evidence packages [Commercial]
├── export/
│   ├── jsonl.py      <- Local file export (async)
│   ├── webhook.py    <- HTTP POST export
│   ├── otlp.py       <- OpenTelemetry export
│   ├── datadog.py    <- Datadog APM traces + metrics
│   ├── grafana.py    <- Grafana Loki export
│   └── cloud.py      <- spanforge Cloud export (thread-safe, batched) [Commercial]
├── exporters/
│   ├── jsonl.py      <- SyncJSONLExporter
│   └── console.py    <- SyncConsoleExporter
├── stream.py         <- EventStream fan-out router (+ Kafka source)
├── integrations/
│   ├── langchain.py  <- LangChain callback handler
│   ├── llamaindex.py <- LlamaIndex event handler
│   ├── openai.py     <- OpenAI tracing wrapper
│   ├── crewai.py     <- CrewAI handler + patch()              [NEW in 2.0]
│   └── ...           (anthropic, groq, ollama, together)
├── namespaces/       <- Typed payload dataclasses
│   ├── trace.py        (SpanPayload + temperature/top_p/max_tokens/error_category,
│   │                    SpanEvent, ToolCall + arguments_raw/result_raw/retry_count)
│   ├── cost.py
│   ├── cache.py
│   └── ...
├── models.py         <- Optional Pydantic v2 models
└── migrate.py        <- Schema migration: v1_to_v2(), migrate_file(), MigrationStats
examples/             <- Runnable sample scripts
├── openai_chat.py    <- OpenAI + JSONL export
├── agent_workflow.py <- Multi-step agent + console exporter
├── langchain_chain.py<- LangChain callback handler
└── secure_pipeline.py<- HMAC signing + PII redaction together

Development setup

git clone https://github.com/veerarag1973/spanforge.git
cd spanforge

python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # macOS / Linux

pip install -e ".[dev]"
pytest                          # run all 3 032 tests

Code quality commands

ruff check .                  # linting
ruff format .                 # auto-format
mypy spanforge                  # type checking
pytest --cov                  # tests + coverage report (>=90% required)

Build the docs locally

pip install -e ".[docs]"
cd docs
sphinx-build -b html . _build/html   # open _build/html/index.html

Compatibility and versioning

spanforge implements RFC-0001 spanforge (AI Compliance Standard for Agentic AI Systems). The current schema version is 2.0.

This project follows Semantic Versioning:

Patch releases (1.0.x) — bug fixes only, fully backwards-compatible
Minor releases (1.x.0) — new features, backwards-compatible
Major releases (x.0.0) — breaking changes, announced in advance

The llm.trace.* namespace payload schema is additionally frozen at v2: even a major release will not remove or rename fields from SpanPayload, AgentRunPayload, or AgentStepPayload.

Changelog

See docs/changelog.md for the full version history.

Contributing

Contributions are welcome! Please read the Contributing Guide first, then open an issue or pull request.

Key rules:

All new code must maintain >= 90 % test coverage
Follow the existing Google-style docstrings
Run ruff and mypy before submitting

Community

GitHub Discussions — questions, ideas, and show-and-tell: github.com/veerarag1973/spanforge/discussions
Bug reports & feature requests — use GitHub Issues with the provided templates
Security disclosures — see SECURITY.md for our responsible-disclosure process
Code of Conduct — this project follows the Contributor Covenant v2.1
Changelog — see docs/changelog.md for release history

GitHub topics for discoverability: ai-compliance ai-governance llm-tracing opentelemetry pii-redaction audit-trail langchain openai python

License

MIT — free for personal and commercial use.

Made with care for the AI compliance community.
Docs · Quickstart · API Reference · Discussions · Report a bug

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.3

Apr 16, 2026

This version

2.0.2

Apr 14, 2026

2.0.1

Apr 14, 2026

2.0.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spanforge-2.0.2.tar.gz (777.7 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spanforge-2.0.2-py3-none-any.whl (384.1 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file spanforge-2.0.2.tar.gz.

File metadata

Download URL: spanforge-2.0.2.tar.gz
Upload date: Apr 14, 2026
Size: 777.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`09a534943becfe4051d91e4b0395f8158169b0380118cbe2bed6361cec557a92`
MD5	`b307d2606b9cee074cab88ab6a37bab1`
BLAKE2b-256	`34551571dbd8202c6d78c96dbe643b8d29e5e08359d1bff763ff7a3787a35a1d`

See more details on using hashes here.

File details

Details for the file spanforge-2.0.2-py3-none-any.whl.

File metadata

Download URL: spanforge-2.0.2-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 384.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9b266656966739785d6702a7a84793cee23d4e44ee9a20aeaca40db55d35920`
MD5	`a27bf441afbc19c1193c82335e0633df`
BLAKE2b-256	`6205411c19ee12cb5a1c1253a04cc6b75f92aef28969f142e86163c1b1e462de`

See more details on using hashes here.

spanforge 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

spanforge

The problem

What spanforge does

Compliance & Regulatory Mapping

Privacy & Audit Infrastructure

Governance & Controls

Developer Experience

How it compares

Install

Optional extras

Quick start — compliance in 5 minutes

1. Configure and instrument

2. Trace AI decisions

3. Generate compliance evidence

4. Hand to your auditor

Regulatory framework coverage

Compliance event types

Core capabilities

Tamper-proof audit chains

PII redaction

Model registry governance

Explainability tracking

GDPR subject erasure

Auto-instrumentation

Using spanforge alongside OpenTelemetry

Export

CLI

Event namespaces

Architecture

What is inside the box

Quality

Development

Versioning

Contributing

Community

License

spanforge

What is this?

Why use it?

How spanforge compares

Install

Optional extras

Using SpanForge alongside OpenTelemetry

Five-minute tour

1 — Trace an LLM call with the span API

1c — Use the high-level Trace API (new in 2.0)

1b — Auto-instrument the OpenAI client (zero boilerplate)

2 — Record a raw event

3 — Redact private information before logging

4 — Sign events for tamper-proof audit trails

5 — Export to anywhere

Kafka source

6 — Sync exporters for non-async workflows

7b — Register lifecycle hooks (new in 2.0)

7c — Aggregate metrics from a trace file (new in 2.0)

7d — Visualize a Gantt timeline (new in 2.0)

8a — Semantic cache — skip redundant LLM calls

8b — Lint your instrumentation in CI

9 — Check compliance and inspect events from the command line

10 — SPA Trace Viewer

What is inside the box

Event namespaces

Quality standards

Project structure

Development setup

Compatibility and versioning

Changelog

Contributing

Community

License

Project details

1c — Use the high-level `Trace` API (new in 2.0)