Skip to main content

SpanForge — AI lifecycle and governance platform (RFC-0001 SPANFORGE)

Project description

spanforge

The AI Compliance Platform for Agentic Systems.
Ship AI applications that are auditable, regulator-ready, and privacy-safe — from day one.

Built on RFC-0001 — the SpanForge AI Compliance Standard for agentic AI systems.

Python 3.9+ PyPI spanforge RFC-0001 91% test coverage 7049 tests Version 1.0.2 Zero dependencies Documentation MIT License Free for any use


The problem

You're building AI applications in a world where regulators are catching up fast. The EU AI Act is in force. GDPR applies to every LLM that touches personal data. SOC 2 auditors want evidence that your AI systems are governed. And your team is stitching together ad-hoc logs, hoping they'll hold up in an audit.

spanforge solves this. It is a compliance-first platform — not a monitoring add-on — that gives every AI action in your stack a cryptographically signed, privacy-safe, regulator-ready record.


If you're a solo developer or early-stage startup

You might think compliance is a later problem — something to worry about when you have a legal team. Here's why it isn't:

  • You'll hit it sooner than you think. The first B2B customer, the first SaaS sign-up from an EU user, the first healthcare or fintech pilot — they'll ask "how do you govern your AI?" If you have no answer, you lose the deal.
  • Retrofitting is expensive. Adding audit trails, PII scrubbing, and signed evidence chains to an existing system takes weeks. Adding them with spanforge from day one takes minutes.
  • It's zero-cost to start. The entire SDK is MIT-licensed, zero dependencies, and works in-memory with no infrastructure. You don't pay anything until you need hosted storage.
  • It de-risks you personally. GDPR fines apply to individuals running services, not just corporations. PII redaction and tamper-proof logs are your protection too.

In short: spanforge is the logging import you should have added on day one — except it also signs your audit trail and maps it to the regulations that will eventually matter to you.

pip install spanforge  # MIT licensed, zero deps
import spanforge
spanforge.configure()  # that's it — you're now compliant-by-default

What spanforge does

Compliance & Regulatory Mapping

  • Map telemetry to EU AI Act, GDPR, SOC 2, HIPAA, ISO 42001, NIST AI RMF clauses automatically
  • Generate HMAC-signed evidence packages with gap analysis
  • Track consent boundaries, HITL oversight, model registry governance, and explainability coverage
  • Produce audit-ready attestations with model owner, risk tier, and status metadata
  • Compliance Evidence Chain (sf-cec) — signed ZIP bundles with regulatory clause maps, DPA generation, and RFC 3161 timestamps for auditor hand-off; spanforge audit cec generate CLI generates CEC bundles without Python code
  • Human-in-the-Loop Workflow Engine (spanforge.workflow) — approval workflows for gate reviews, policy sign-offs, and escalations; full state machine (PENDING → APPROVED / REJECTED → CLOSED) with SLA auto-escalation and role-based action matrix
  • Observability SDK (sf-observe) — span export (OTLP/Datadog/Grafana/Splunk/Elastic), W3C TraceContext, OTel GenAI attrs, sampling strategies, annotation store, and health probes
  • CI/CD Gate Pipeline (sf-gate) — evaluate release quality gates (schema, secrets, performance, PRRI, trust), YAML pipeline engine, artifact store, and blocking trust gate to prevent unsafe releases
  • T.R.U.S.T. Scorecard (sf-trust) — five-pillar trust dimensions (Transparency · Reliability · UserTrust · Security · Traceability), configurable weights, SVG badge, history time-series, and 5 HallucCheck pipeline integrations (score, bias, monitor, risk, benchmark)
  • Enterprise Hardening (sf-enterprise) — multi-tenancy with project-level isolation, data residency enforcement (EU/US/AP/IN), AES-256-GCM encryption at rest, envelope encryption via cloud KMS, mTLS, FIPS 140-2 mode, air-gap offline mode, and container health endpoints
  • Security Review (sf-security) — OWASP API Security Top 10 audit, STRIDE threat modelling, dependency vulnerability scanning, static analysis, and secrets-in-logs detection

Privacy & Audit Infrastructure

  • Secrets scanning — 20-pattern registry detects API keys, tokens, private keys; SARIF output; pre-commit hook
  • PII redaction — detect and strip sensitive data before it leaves your app. Includes a Presidio NLP backend (spanforge[presidio]) covering 15 entity types (SSN, email, phone, AADHAAR, PAN, UK NI, credit card, IBAN, and more) with ≥ 95% true-positive rate and < 0.5% false-positive rate verified at GA
  • HMAC audit chains — tamper-evident, blockchain-style event signing
  • Audit SDK (sf-audit)sf_audit.append(), schema key registry, T.R.U.S.T. scorecard, GDPR Article 30 RoPA, BYOS cloud routing
  • GDPR subject erasure — right-to-erasure with tombstone events that preserve chain integrity
  • Air-gapped deployment — runs fully offline with zero egress

Governance & Controls

  • Consent boundary monitoringconsent.granted, consent.revoked, consent.violation events
  • Human-in-the-loop hookshitl.queued, hitl.reviewed, hitl.escalated, hitl.timeout events
  • Model registry — register, deprecate, retire models; attestations auto-warn on ungoverned models
  • Explainability trackingsf_explain.explain(response, context) returns a signed ExplainRecord with EU AI Act Article 13/14 clause mapping, decision_drivers, and HMAC-signed audit entry on every call. @spanforge.governed wraps any callable to auto-explain every model response with zero extra code.

Developer Experience

  • Zero required dependencies — pure Python 3.9+ stdlib
  • One-line setupspanforge.configure() and you're compliant
  • Integration config.halluccheck.toml config block, service registry, local fallbacks for all 11 services
  • T.R.U.S.T. Scorecard (sf-trust) — five-pillar trust assessment (Transparency, Reliability, UserTrust, Security, Traceability), SVG badge generation, HallucCheck pipeline integrations
  • Mock libraryspanforge.testing_mocks — 11 drop-in mock service clients + mock_all_services() context manager for zero-network unit tests
  • Sandbox mode[spanforge] sandbox = true routes all service calls to local in-memory sandbox
  • spanforge doctor — environment diagnostics: config valid, services reachable, patterns loaded, gate YAML valid
  • Auto-instrumentation — patch OpenAI, Anthropic, LangChain, CrewAI, and more; @trace_rag decorator and automatic LlamaIndex/LangChain retriever instrumentation for zero-change RAG tracing
  • Async SDK — every major SDK method now has a non-blocking *_async() variant (scan_async, evaluate_async, build_bundle_async, get_scorecard_async, sso_delegate_session_async) for seamless use in async frameworks
  • User feedback REST endpointPOST /v1/feedback accepts star/thumbs/Likert ratings and free-text comments (SHA-256 hashed); links to T.R.U.S.T. dimensions
  • spanforge config init / validate — interactive config wizard, schema validation, and connectivity probe for ~/.spanforge/config.yaml
  • spanforge export siem — stream CEF or LEEF lines from a JSONL events file to any SIEM via --format cef|leef; reads stdin or --input FILE
  • SpanForgeLangGraphCallback — LangChain-compatible callback (on_chain_start/end, on_tool_start/end, on_agent_action) that emits typed SpanForge events; no LangGraph runtime required
  • 39 CLI commands — compliance checks, PII scans, secrets scanning, audit-chain verification, event generation, audit log extraction, CEC bundle generation, gap detection, gate policy audit, CI/CD gate pipelines, trust scorecards, config validation, enterprise health, security scanning, doctor diagnostics, all CI-ready

How it compares

spanforge is the only open-standard, zero-dependency AI compliance platform. Other tools are monitoring platforms that bolt on compliance as an afterthought. spanforge is compliance infrastructure that happens to capture the telemetry needed to prove it.

Capability spanforge LangSmith Langfuse OpenLLMetry Arize Phoenix
Regulatory framework mapping (EU AI Act, GDPR, SOC 2…)
HMAC-signed evidence packages & attestations
Consent boundary monitoring
Human-in-the-loop compliance events
Model registry with risk-tier governance
Explainability coverage metrics
Built-in PII redaction
Tamper-proof audit chain
GDPR subject erasure (right-to-erasure)
Works fully offline / air-gapped Self-host Partial Self-host
Open schema standard (RFC-driven) Partial
Zero required dependencies
OTLP export (any OTel backend)
Source-available, no call-home Partial
CI/CD release quality gates (schema, secrets, PRRI, trust gate)

Bottom line: Others help you watch your AI. spanforge helps you govern it.


Install

pip install spanforge

Requires Python 3.9+. Zero mandatory dependencies.

Documented CLI entrypoints, version consistency, and known stale doc patterns are also checked in CI so public docs drift fails fast during PR validation.

Optional extras

pip install "spanforge[openai]"       # OpenAI auto-instrumentation
pip install "spanforge[langchain]"    # LangChain callback handler
pip install "spanforge[crewai]"       # CrewAI callback handler
pip install "spanforge[http]"         # Webhook + OTLP export
pip install "spanforge[datadog]"      # Datadog APM + metrics
pip install "spanforge[kafka]"        # Kafka EventStream source
pip install "spanforge[pydantic]"     # Pydantic v2 model layer
pip install "spanforge[otel]"         # OpenTelemetry SDK integration
pip install "spanforge[jsonschema]"   # Strict JSON Schema validation
pip install "spanforge[llamaindex]"   # LlamaIndex event handler
pip install "spanforge[gemini]"       # Google Gemini auto-instrumentation
pip install "spanforge[bedrock]"      # AWS Bedrock Converse API
pip install "spanforge[presidio]"     # Presidio-powered PII detection
pip install "spanforge[all]"          # everything above

Runtime Governance GA Surface

The GA implementation spine is the runtime-governance control plane:

  • sf_explain for signed runtime explanations — now with ExplainModelType classification (LLM, RAG, MULTI_AGENT, CLASSIFIER, EMBEDDING), configurable retry, and fail-safe emit
  • sf_scope for agent capability enforcement — now with circuit-breaker fail-secure mode and ACTION_CATEGORIES dictionary
  • sf_rbac for role enforcement on sensitive actions — now with STANDARD_ROLE_MATRIX (10 canonical actor types), YAML manifest loading, and JWT claim extraction
  • sf_rag for grounding evidence and thresholds
  • sf_lineage for provenance capture
  • sf_policy for policy activation, replay, simulation, and review
  • sf_operator for trace inspection and signed operator exports
  • sf_enterprise for deployment posture and enterprise evidence packaging

Start here if you want the end-to-end story instead of the full product surface:


Quick start — compliance in 5 minutes

1. Configure and instrument

import spanforge

spanforge.configure(
    service_name="my-agent",
    signing_key="your-org-secret",      # HMAC audit chain — tamper-proof
    redaction_policy="gdpr",            # PII stripped before export
    exporter="jsonl",
    endpoint="audit.jsonl",
)

Every event your app emits is now signed, PII-redacted, and stored — with zero per-call boilerplate.

2. Trace AI decisions

with spanforge.start_trace("loan-approval-agent") as trace:
    with trace.llm_call("gpt-4o", temperature=0.2) as span:
        decision = call_llm(prompt)
        span.set_token_usage(input=512, output=200, total=712)
        span.set_status("ok")

3. Generate compliance evidence

from spanforge.core.compliance_mapping import ComplianceMappingEngine

engine = ComplianceMappingEngine()
package = engine.generate_evidence_package(
    model_id="gpt-4o",
    framework="eu_ai_act",
    from_date="2026-01-01",
    to_date="2026-03-31",
    audit_events=events,
)

print(package.attestation.coverage_pct)            # e.g. 87.5%
print(package.attestation.explanation_coverage_pct) # e.g. 75.0%
print(package.attestation.model_risk_tier)          # e.g. "high"
print(package.gap_report)                           # what's missing

Or from the CLI:

spanforge compliance generate \
  --model-id gpt-4o \
  --framework eu_ai_act \
  --from 2026-01-01 --to 2026-03-31 \
  --events-file audit.jsonl

4. Hand to your auditor

The evidence package contains:

  • Clause mappings — which telemetry events satisfy which regulatory clauses
  • Gap analysis — which clauses lack evidence and need attention
  • HMAC-signed attestation — cryptographic proof the evidence hasn't been tampered with
  • Model governance metadata — owner, risk tier, status, warnings for deprecated/retired models
  • Explanation coverage — percentage of AI decisions with explainability records

5. Package for auditors with sf-cec (v2.0.4+)

Bundle your audit records into a regulator-ready, HMAC-signed ZIP:

from spanforge.sdk import sf_cec

# Build a compliance evidence bundle for Q1 2026
result = sf_cec.build_bundle(
    project_id="my-agent",
    date_range=("2026-01-01", "2026-03-31"),
    frameworks=["eu_ai_act", "iso_42001", "soc2"],
)

print(result.bundle_id)       # sfcec_my-agent_20260401T000000Z_abc123
print(result.zip_path)        # /tmp/sfcec/halluccheck_cec_my-agent_2026-01-01_2026-03-31.zip
print(result.hmac_manifest)   # hmac-sha256:a3f9…
print(result.record_counts)   # {"halluccheck.score.v1": 214, "halluccheck.bias.v1": 87, …}

# Verify bundle integrity before sharing
verify = sf_cec.verify_bundle(result.zip_path)
assert verify.overall_valid

# Generate a GDPR Art. 28 Data Processing Agreement
dpa = sf_cec.generate_dpa(
    project_id="my-agent",
    controller_details={"name": "Acme Corp", "contact": "dpo@acme.com"},
    processor_details={"name": "ML Platform Team"},
)
print(dpa.document_id)  # sfcec-dpa-my-agent-20260401

The ZIP bundle contains:

  • manifest.json — record inventory with HMAC-SHA256 signature
  • clause_map.json — per-framework clause satisfaction (SATISFIED / PARTIAL / GAP)
  • chain_proof.json — audit chain verification result
  • attestation.json — HMAC-signed attestation metadata
  • rfc3161_timestamp.tsr — trusted timestamp stub (RFC 3161)
  • score_records/, bias_reports/, prri_records/, drift_events/, pii_detections/, gate_evaluations/ — NDJSON evidence per schema key

6. Observe spans with sf-observe (v2.0.5+)

Export spans to any OTLP-compatible backend, emit structured annotations, and trace LLM calls with OTel GenAI semantic conventions:

from spanforge.sdk import sf_observe

# Emit a span for an LLM call — W3C traceparent + OTel GenAI attrs added automatically
span_id = sf_observe.emit_span(
    "chat.completion",
    {
        "gen_ai.system": "openai",
        "gen_ai.request.model": "gpt-4o",
        "gen_ai.usage.input_tokens": 512,
        "gen_ai.usage.output_tokens": 64,
    },
)
print(span_id)  # "a3f1b2c4d5e6f708"

# Mark a model deployment
annotation_id = sf_observe.add_annotation(
    "model_deployed",
    {"model": "gpt-4o", "environment": "production"},
    project_id="my-agent",
)

# Health probe
print(sf_observe.healthy)         # True
print(sf_observe.last_export_at)  # ISO-8601 or None

# Export to any OTLP endpoint per-call
from spanforge.sdk import ReceiverConfig
result = sf_observe.export_spans(
    my_spans,
    receiver_config=ReceiverConfig(
        endpoint="https://otel.collector.example.com/v1/traces",
        headers={"Authorization": "Bearer tok"},
    ),
)
print(result.exported_count, result.backend)

Select backend and sampler via environment:

export SPANFORGE_OBSERVE_BACKEND=otlp          # otlp | datadog | grafana | splunk | elastic | local
export SPANFORGE_OBSERVE_SAMPLER=trace_id_ratio
export SPANFORGE_OBSERVE_SAMPLE_RATE=0.25

7. Route alerts with sf-alert (v2.0.6+)

Publish topic-based alerts to Slack, PagerDuty, OpsGenie, Teams, SMS, and custom webhooks — with built-in deduplication, escalation policy, and maintenance-window suppression:

from spanforge.sdk import sf_alert

# Publish a CRITICAL drift alert
result = sf_alert.publish(
    "halluccheck.drift.red",
    {"model": "gpt-4o", "drift_score": 0.91},
    severity="critical",
    project_id="my-agent",
)
print(result.alert_id)    # UUID4
print(result.suppressed)  # True if deduplicated / maintenance window

# Acknowledge to cancel the 15-minute escalation timer
sf_alert.acknowledge(result.alert_id)

# Register a custom topic
sf_alert.register_topic(
    "myapp.pipeline.failed",
    "ML pipeline execution failure",
    "high",
    runbook_url="https://runbooks.example.com/pipeline",
)

Configure sinks via environment variables (zero code required):

export SPANFORGE_ALERT_TEAMS_WEBHOOK=https://xxx.webhook.office.com/...
export SPANFORGE_ALERT_OPSGENIE_KEY=og-key-...
export SPANFORGE_ALERT_DEDUP_SECONDS=300

8. Enforce release gates with sf-gate (v2.0.7+)

Run YAML-declared quality gates before every release. Block on schema violations, secrets leaks, performance regressions, unsafe PRRI scores, and trust failures — all in a single pipeline command:

from spanforge.sdk import sf_gate

# Run a full YAML gate pipeline — blocks on any FAIL gate
result = sf_gate.run_pipeline("gates/ci-pipeline.yaml")
for g in result.gate_results:
    print(f"[{g.verdict.value}] {g.gate_id}")  # e.g. [PASS] schema-validation

# Evaluate a single gate programmatically
verdict = sf_gate.evaluate("schema-validation", event.to_dict())
print(verdict.verdict)   # GateVerdict.PASS

# Standalone PRRI evaluation
prri = sf_gate.evaluate_prri("my-agent", prri_score=28)
print(prri.verdict)      # PRRIVerdict.GREEN

# Composite trust gate — checks HRI rate, PII, and secrets windows
trust = sf_gate.get_status()
print(trust.healthy)     # True if all thresholds are within bounds

Or from CI directly:

# Runs the pipeline, exits 1 if any blocking gate fails
spanforge gate run gates/ci-pipeline.yaml

# Enforce the composite trust gate as a deployment prerequisite
spanforge gate trust-gate --project-id my-agent

A minimal ci-pipeline.yaml:

version: "1.0"
gates:
  - id: schema-validation
    type: schema_validation
    on_fail: block
  - id: secrets-scan
    type: secrets_scan
    on_fail: block
  - id: prri-check
    type: halluccheck_prri
    params:
      red_threshold: 65
    on_fail: block
  - id: trust-gate
    type: halluccheck_trust
    on_fail: block

9. Unified config & local fallback (v2.0.8+)

Bootstrap all 8 services from a single .halluccheck.toml config block. When a remote service is unreachable, the SDK automatically falls back to a local-mode equivalent — no code changes required:

# .halluccheck.toml
[spanforge]
enabled    = true
project_id = "my-agent"
endpoint   = "https://api.spanforge.example.com"

[spanforge.services]
sf_pii     = true
sf_secrets = true
sf_audit   = true
sf_observe = true

[spanforge.local_fallback]
enabled     = true
max_retries = 3
timeout_ms  = 2000
from spanforge.sdk import load_config_file, validate_config

# Parse, validate, and apply env-var overrides in one call
config = load_config_file()                # auto-discovers .halluccheck.toml
errors = validate_config(config)           # [] when valid
print(config.services.sf_pii)             # True
print(config.local_fallback.timeout_ms)   # 2000

Validate from the CLI:

spanforge config validate                          # auto-discover
spanforge config validate --file .halluccheck.toml # explicit path

When a service is down, fallback activates automatically:

from spanforge.sdk import pii_fallback, secrets_fallback, audit_fallback

# Local regex PII scan (no remote service required)
result = pii_fallback("Contact alice@example.com")
print(result["entities"])  # [{"type": "EMAIL", ...}]

# Local secrets scan
result = secrets_fallback("AKIA1234567890ABCDEF")
print(result["clean"])     # False

# Local HMAC-chained JSONL audit
audit_fallback(
    {"score": 0.92, "model": "gpt-4o"},
    schema_key="halluccheck.score.v1",
)

The ServiceRegistry tracks health for all services and re-checks every 60 s:

from spanforge.sdk import ServiceRegistry

reg = ServiceRegistry.get_instance()
status = reg.status_response()
# {"sf_pii": {"status": "up", "latency_ms": 45, "last_checked_at": "..."}, ...}

10. T.R.U.S.T. Scorecard & HallucCheck pipelines (v2.0.9+)

The T.R.U.S.T. scorecard aggregates five trust dimensions into a single weighted score with colour-band verdicts. Each pillar maps to existing audit telemetry:

Pillar What it measures Source
Transparency Gate pass rate sf_gate evaluations
Reliability Hallucination rate halluccheck.score.v1 records
UserTrust Bias disparity halluccheck.bias.v1 records
Security PII + secrets hygiene sf_pii / sf_secrets scans
Traceability Compliance posture Attestation coverage

Colour bands: green ≥ 80, amber ≥ 60, red < 60.

from spanforge.sdk import sf_trust

# Full scorecard with all five dimensions
scorecard = sf_trust.get_scorecard(project_id="my-agent")
print(scorecard.overall_score)   # 82.5
print(scorecard.colour_band)     # "green"
print(scorecard.reliability)     # TrustDimension(score=90.0, trend="up", ...)

# SVG badge for dashboards / README shields
badge = sf_trust.get_badge(project_id="my-agent")
with open("trust-badge.svg", "w") as f:
    f.write(badge.svg)

# Historical time-series (10 buckets)
history = sf_trust.get_history(project_id="my-agent", buckets=10)
for entry in history:
    print(entry.timestamp, entry.overall)

Five HallucCheck pipeline integrations orchestrate cross-service workflows:

from spanforge.sdk.pipelines import (
    score_pipeline,
    bias_pipeline,
    monitor_pipeline,
    risk_pipeline,
    benchmark_pipeline,
)

# Score pipeline: PII scan → secrets scan → observe span → audit append
result = score_pipeline("The model output to check", model="gpt-4o")
print(result.audit_id, result.details)

# Risk pipeline: audit PRRI record → alert if RED → optional gate → optional CEC bundle
result = risk_pipeline({"verdict": "RED", "prri_score": 75.0}, project_id="my-agent")
print(result.details["verdict"])  # "RED"

From the CLI:

# T.R.U.S.T. scorecard (text table)
spanforge trust scorecard --project-id my-agent

# SVG badge to stdout
spanforge trust badge --project-id my-agent > trust.svg

# Composite trust gate (exit 1 = trust below threshold)
spanforge trust gate --project-id my-agent

11. Test with zero-network mocks (v2.0.11+)

Drop-in mock service clients for every SpanForge SDK service — no network, no configuration, no side-effects:

from spanforge.testing_mocks import mock_all_services

with mock_all_services():
    from spanforge.sdk import sf_pii, sf_audit, sf_gate

    # All calls are local, recorded, and return sensible defaults
    result = sf_pii.scan_text("Contact alice@example.com")
    assert result.clean  # mock returns clean=True by default

    sf_audit.append({"score": 0.92}, schema_key="halluccheck.score.v1")
    assert len(sf_audit.calls) == 1  # inspect recorded calls

    prri = sf_gate.evaluate_prri("my-agent", prri_score=28)
    assert prri.allow  # GREEN by default

Override default returns per-method:

from spanforge.testing_mocks import MockSFPII

mock = MockSFPII()
mock.configure_response("scan_text", {"clean": False, "entities": ["EMAIL"]})
result = mock.scan_text("test")
assert not result["clean"]

Run spanforge doctor for a full environment diagnostic:

spanforge doctor
# ✅ Config valid
# ✅ All 11 services reachable
# ✅ API key not expired
# ✅ PII/secrets patterns loaded
# ✅ Gate YAML valid

Regulatory framework coverage

The ComplianceMappingEngine maps your telemetry events to specific regulatory clauses:

Framework Clause Mapped events What it proves
GDPR Art. 22 consent.*, hitl.* Automated decisions have consent + human oversight
GDPR Art. 25 llm.redact.*, consent.* Privacy by design — PII handled before export
EU AI Act Art. 13 explanation.* AI decisions are transparent and explainable
EU AI Act Art. 14 hitl.*, consent.* Human oversight of high-risk AI
EU AI Act Annex IV.5 llm.guard.*, llm.audit.*, hitl.* Technical documentation — safety + oversight
SOC 2 CC6.1 llm.audit.*, llm.trace.*, model_registry.* Logical access controls + model governance
NIST AI RMF MAP 1.1 llm.trace.*, llm.eval.*, model_registry.*, explanation.* Risk identification and mapping
HIPAA §164.312 llm.redact.*, llm.audit.* PHI access controls and audit
ISO 42001 A.5–A.10 Full event set AI management system controls

Compliance event types

spanforge defines purpose-built event types for AI governance — these aren't afterthought log messages, they are first-class compliance primitives:

Category Event types Purpose
Consent consent.granted, consent.revoked, consent.violation Track user consent for automated processing
Human-in-the-Loop hitl.queued, hitl.reviewed, hitl.escalated, hitl.timeout Prove human oversight of AI decisions
Model Registry model_registry.registered, model_registry.deprecated, model_registry.retired Govern model lifecycle and risk
Explainability explanation.generated Attach explanations to AI decisions
Guardrails llm.guard.* Safety classifier outputs and block decisions
PII llm.redact.* Audit trail of what PII was found and removed
Audit llm.audit.* Access logs and chain-of-custody records
Traces llm.trace.* Model calls, tokens, latency, cost

Core capabilities

Tamper-proof audit chains

Every event is HMAC-SHA256 signed and chained to its predecessor — the same principle as certificate chains. Alter one event and the entire chain breaks.

from spanforge.signing import AuditStream, verify_chain

stream = AuditStream(org_secret="your-secret")
for event in events:
    stream.append(event)

result = verify_chain(stream.events, org_secret="your-secret")
assert result.valid  # any tampering → False

PII redaction

Strip personal data before events leave your application boundary. Deep scanning with Luhn and Verhoeff validation for credit cards and Aadhaar numbers, SSN range validation (_is_valid_ssn), calendar validation for dates of birth (_is_valid_date), and built-in patterns for date_of_birth and street address.

from spanforge.redact import RedactionPolicy, Sensitivity

policy = RedactionPolicy(min_sensitivity=Sensitivity.PII, redacted_by="policy:gdpr-v1")
result = policy.apply(event)
# All PII fields → "[REDACTED by policy:gdpr-v1]"

Model registry governance

Register models with ownership and risk metadata. Attestations automatically warn when models are deprecated, retired, or unregistered.

from spanforge.model_registry import ModelRegistry

registry = ModelRegistry()
registry.register("gpt-4o", owner="ml-platform", risk_tier="high")
registry.deprecate("gpt-3.5-turbo", reason="Successor available")

# Evidence packages now include:
#   model_owner: "ml-platform"
#   model_risk_tier: "high"
#   model_status: "active"
#   model_warnings: []  (or ["model 'gpt-3.5-turbo' is deprecated"])

Explainability tracking

Measure what percentage of your AI decisions have explanations attached:

from spanforge.explain import generate_explanation

explanation = generate_explanation(
    decision_event_id="evt_01HX...",
    method="feature_importance",
    content="Top factors: credit_score (0.42), income (0.31)...",
)
# explanation_coverage_pct in attestations = explained / total decisions

GDPR subject erasure

Right-to-erasure with tombstone events that preserve audit chain integrity:

spanforge audit erase audit.jsonl --subject-id user123

Auto-instrumentation

Patch supported providers once — compliance data flows automatically:

# Instrument all installed providers in one call
import spanforge.auto
spanforge.auto.setup()

# Or patch individually
from spanforge.integrations import openai as sf_openai
sf_openai.patch()    # every OpenAI call → signed, redacted, compliant
sf_openai.unpatch()  # restore original behaviour

Supported providers: OpenAI, Anthropic, Google Gemini, AWS Bedrock, Ollama, Groq, Together AI

Supported frameworks: LangChain, LlamaIndex, CrewAI


Using SpanForge alongside OpenTelemetry

spanforge is not an OTel replacement. OTel handles performance monitoring. spanforge adds the compliance layer OTel cannot provide — audit chains, PII redaction, consent tracking, and regulator-ready attestations.

# Your existing OTel pipeline stays untouched
from opentelemetry.sdk.trace import TracerProvider
provider = TracerProvider()

# Add spanforge's compliance layer alongside it
import spanforge
spanforge.configure(mode="otel_passthrough")

# Dual-stream: OTel for monitoring, spanforge for compliance
spanforge.configure(exporters=["otel_passthrough", "jsonl"], endpoint="audit.jsonl")

Export

Ship compliance events to any backend:

from spanforge.stream import EventStream
from spanforge.export.jsonl import JSONLExporter
from spanforge.export.otlp import OTLPExporter
from spanforge.export.datadog import DatadogExporter
from spanforge.export.grafana import GrafanaLokiExporter
from spanforge.export.cloud import CloudExporter
from spanforge.export.siem_splunk import SplunkHECExporter
from spanforge.export.siem_syslog import SyslogExporter

stream = EventStream(events)

await stream.drain(JSONLExporter("audit.jsonl"))                    # local file
await stream.drain(OTLPExporter("http://collector:4318/v1/traces")) # OTel collector
await stream.drain(DatadogExporter(service="my-app"))               # Datadog APM
await stream.drain(GrafanaLokiExporter(url="http://loki:3100"))     # Grafana Loki
await stream.drain(CloudExporter(api_key="sf_live_xxx"))            # spanforge Cloud
await stream.drain(SplunkHECExporter())                             # Splunk HEC (env-var config)
await stream.drain(SyslogExporter())                                # Syslog/CEF (env-var config)

# Lightweight CEF/LEEF string formatter (no network, no dependencies)
from spanforge.export.siem import SIEMExporter
exporter = SIEMExporter(format="cef")
for event in events:
    print(exporter.export(event))  # one CEF line per event

# Or via CLI
# spanforge export siem --format leef --input audit.jsonl | logger -n siem.corp.example -P 514

Fan-out routing for compliance alerting:

from spanforge.export.webhook import WebhookExporter

# Route guardrail violations to Slack
await stream.route(
    WebhookExporter("https://hooks.slack.com/your-webhook"),
    predicate=lambda e: e.event_type == "llm.guard.output.blocked",
)

CLI

38 commands — all CI-pipeline ready:

# Compliance
spanforge compliance generate --model-id gpt-4o --framework eu_ai_act \
  --from 2026-01-01 --to 2026-03-31 --events-file events.jsonl
spanforge compliance check --framework eu_ai_act \
  --from 2026-01-01 --to 2026-03-31 --events-file events.jsonl
spanforge compliance validate-attestation evidence.json
spanforge compliance status --events-file events.jsonl   # compliance summary JSON

# Audit chain
spanforge audit-chain events.jsonl             # verify chain integrity
spanforge audit erase events.jsonl --subject-id user123  # GDPR erasure
spanforge audit rotate-key events.jsonl        # key rotation
spanforge audit verify --input events.jsonl    # verify integrity
spanforge audit extract events.jsonl --type llm.trace.span.completed --since 2026-01-01  # filter & extract
spanforge audit cec generate --project-id my-agent --sign  # CEC compliance bundle ZIP
spanforge audit gap-finder events.jsonl --threshold-minutes 30  # detect time gaps + missing fields

# Privacy & Secrets
spanforge scan events.jsonl --fail-on-match    # CI-gate PII scan
spanforge secrets scan <file>                  # scan file for secrets (exit 0=clean, 1=found)
spanforge secrets scan <file> --format sarif   # SARIF output for GitHub Code Scanning
spanforge secrets scan <file> --redact         # print redacted version to stdout
spanforge secrets set KEY VALUE                # store a secret in local secrets store
spanforge secrets get KEY                      # retrieve a stored secret
spanforge secrets list                         # list stored secret key names
spanforge secrets delete KEY                   # remove a stored secret

# Event generation
spanforge event create --type llm.trace.span.completed --count 10 --format jsonl  # generate test events

# Validation
spanforge check                                # 9-step end-to-end health check (--verbose for timing)
spanforge check-compat events.json             # v2.0 compatibility
spanforge validate events.jsonl                # JSON Schema validation
spanforge validate events.jsonl --report detailed --format json  # detailed report
spanforge validate --dataset training.jsonl                    # Article 10 compliance scan; exits 1 if any clause fails
spanforge validate --dataset training.jsonl --output json      # machine-readable JSON report
spanforge validate --dataset training.jsonl --output pdf       # PDF report (requires pip install spanforge[compliance])

# Configuration
spanforge config init                          # interactive wizard → ~/.spanforge/config.yaml
spanforge config init --non-interactive        # write defaults immediately
spanforge config init --force                  # overwrite existing config
spanforge config validate                      # validate ~/.spanforge/config.yaml
spanforge config validate --config path/to.yaml --check-connectivity  # validate + probe OTLP
spanforge config validate --file path/to.toml  # validate .halluccheck.toml (legacy)

# Development
spanforge dev reset                            # wipe local dev state (trace store, audit chain)
spanforge dev reset --hard                     # also delete ~/.spanforge/config.yaml
spanforge dev reset --dry-run                  # list files that would be removed

# Analysis
spanforge stats events.jsonl                   # counts, tokens, cost
spanforge stats events.jsonl --group-by model --format json  # grouped stats, JSON output
spanforge inspect <EVENT_ID> events.jsonl      # pretty-print one event
spanforge inspect <EVENT_ID> events.jsonl --format csv  # CSV export
spanforge cost events.jsonl                    # token spend report
spanforge cost run --run-id <id> --input events.jsonl  # per-run cost report

# Evaluation
spanforge eval save --input events.jsonl --output dataset.jsonl  # extract eval dataset
spanforge eval run --file dataset.jsonl --scorers faithfulness,pii_leakage  # run scorers

# Migration
spanforge migrate events.jsonl --sign          # v1→v2 migration
spanforge migrate-langsmith export.jsonl       # LangSmith → SpanForge conversion
spanforge list-deprecated                      # deprecated event types
spanforge migration-roadmap                    # v2 migration plan
spanforge check-consumers                      # consumer compatibility

# CI/CD Gate Pipeline
spanforge gate run gates/ci-pipeline.yaml               # run YAML gate pipeline (exit 1 = blocking gate failed)
spanforge gate run gates/ci-pipeline.yaml --format json  # JSON output for CI dashboards
spanforge gate evaluate schema-validation --payload event.json  # evaluate single gate
spanforge gate trust-gate --project-id my-agent         # composite trust gate check
spanforge gate audit events.jsonl --fail-on-violation   # policy audit of gate records (CI gate)

# T.R.U.S.T. Scorecard
spanforge trust scorecard --project-id my-agent         # five-pillar trust scorecard (text table)
spanforge trust badge --project-id my-agent             # SVG badge to stdout
spanforge trust gate --project-id my-agent              # composite trust gate (exit 1 = below threshold)

# Enterprise (Phase 11)
spanforge enterprise status                            # enterprise subsystem status JSON
spanforge enterprise health                            # enterprise health check (all services)

# Security (Phase 11)
spanforge security owasp                               # OWASP API Security Top 10 audit
spanforge security scan                                # full security scan (deps + static + secrets-in-logs)
spanforge security threat-model                        # STRIDE threat model summary
spanforge security audit-logs --path /var/log/myapp/   # secrets-in-logs detection

# Developer Experience (Phase 12)
spanforge doctor                                       # environment diagnostics (config, services, keys, patterns)

# Viewer
spanforge serve                                # local SPA trace viewer
spanforge ui                                   # standalone HTML viewer

Event namespaces

Every event carries a typed payload. The built-in namespaces:

Prefix Dataclass What it records
consent.* ConsentPayload User consent grants, revocations, violations
hitl.* HITLPayload Human-in-the-loop review, escalation, timeout
model_registry.* ModelRegistryEntry Model registration, deprecation, retirement
explanation.* ExplainabilityRecord Explainability records for AI decisions
llm.trace.* SpanPayload Model calls — tokens, latency, cost (frozen v2)
llm.guard.* GuardPayload Safety classifier outputs, block decisions
llm.redact.* RedactPayload PII audit — what was found and removed
llm.audit.* AuditChainPayload Access logs and chain-of-custody
llm.eval.* EvalScenarioPayload Scores, labels, evaluator identity
llm.cost.* CostPayload Per-call cost in USD
llm.cache.* CachePayload Cache hit/miss, backend, TTL
llm.prompt.* PromptPayload Prompt template version, rendered text
llm.fence.* FencePayload Topic constraints, allow/block lists
llm.diff.* DiffPayload Prompt/response delta between events
llm.template.* TemplatePayload Template registry metadata

Architecture

spanforge/
+-- core/
│   +-- compliance_mapping.py  — ComplianceMappingEngine, evidence packages, attestations
+-- compliance/                — Programmatic compliance test suite
+-- signing.py                 — HMAC audit chains, key management, multi-tenant KeyResolver
+-- redact.py                  — PII detection + redaction policies
+-- model_registry.py          — Model lifecycle governance
+-- explain.py                 — Explainability records
+-- consent.py                 — Consent boundary events
+-- hitl.py                    — Human-in-the-loop events
+-- governance.py              — Policy-based event gating
+-- event.py                   — Event envelope
+-- types.py                   — EventType enum (consent.*, hitl.*, model_registry.*, explanation.*, llm.*)
+-- config.py                  — configure() / get_config()
+-- _span.py                   — Span, AgentRun, AgentStep context managers
+-- _trace.py                  — Trace + start_trace()
+-- _tracer.py                 — Top-level tracing entry point
+-- _stream.py                 — Internal dispatch: sample — redact — sign — export
+-- _store.py                  — TraceStore ring buffer
+-- _hooks.py                  — HookRegistry (lifecycle hooks)
+-- _server.py                 — HTTP server (/traces, /compliance/summary)
+-- _cli.py                    ← 39 CLI sub-commands
+-- workflow.py                — Human-in-the-Loop Workflow Engine (CORE-15); WorkflowEngine, WorkflowType, state machine, SLA escalation
+-- cost.py                    — CostTracker, BudgetMonitor, @budget_alert
+-- cache.py                   — SemanticCache, @cached decorator
+-- retry.py                   — @retry, FallbackChain, CircuitBreaker
+-- toolsmith.py               — @tool, ToolRegistry
+-- http.py                    — Zero-dependency OpenAI-compatible HTTP client
+-- io.py                      — JSONL read/write/append utilities
+-- plugins.py                 — Entry-point plugin discovery
+-- schema.py                  — Lightweight zero-dependency JSON Schema validator
+-- regression.py              — Pass/fail regression detector
+-- stats.py                   — Percentile, latency summary utilities
+-- presidio_backend.py        — Optional Presidio-powered PII detection
+-- _ansi.py                   — ANSI color helpers (NO_COLOR aware)
+-- lint/                      — AST-based instrumentation linter (AO000–AO005)
+-- export/                    — JSONL, OTLP, Webhook, Datadog, Grafana Loki, Cloud, Redis, Splunk HEC, Syslog/CEF
+-- integrations/              — OpenAI, Anthropic, Gemini, Bedrock, LangChain, LlamaIndex, CrewAI, Ollama, Groq, Together
+-- namespaces/                — Typed payload dataclasses
+-- gate.py                    — GateRunner YAML pipeline engine, 6 gate executors, artifact store (Phase 8)
+-- sdk/                       — Service SDK clients (sf-identity, sf-pii, sf-secrets, sf-audit, sf-cec, sf-observe, sf-alert, sf-gate, sf-trust, sf-enterprise, sf-security)
│   +-- explain.py             —   SFExplainClient – ExplainModelType enum (LLM/RAG/MULTI_AGENT/CLASSIFIER/EMBEDDING), signed explanations, retry+timeout emit (Phase 1B)
│   +-- scope.py               —   SFScopeClient – ACTION_CATEGORIES (5 categories), circuit-breaker fail-secure, resolve_action_category() (Phase 1B)
│   +-- rbac.py                —   SFRBACClient – STANDARD_ROLE_MATRIX (10 actor types), register_actor_from_yaml(), register_actor_from_jwt() (Phase 1C)
│   +-- identity.py            —   SFIdentityClient – keys, JWT, TOTP, MFA, magic-link
│   +-- pii.py                 —   SFPIIClient – scan, redact, anonymize
│   +-- secrets.py             —   SFSecretsClient – 20-pattern secret scanning, SARIF output
│   +-- audit.py               —   SFAuditClient – HMAC-chained records, T.R.U.S.T. scorecard, Article 30, BYOS
│   +-- cec.py                 —   SFCECClient – signed CEC ZIP bundles, clause mapping, DPA generation (Phase 5)
│   +-- observe.py             —   SFObserveClient – span export, OTel GenAI attrs, W3C TraceContext, sampling (Phase 6)
│   +-- alert.py               —   SFAlertClient – topic-based routing, dedup, escalation policy, 6 sink integrations (Phase 7)
│   +-- gate.py                —   SFGateClient – YAML pipeline runner, evaluate(), evaluate_prri(), trust-gate, artifact management (Phase 8)
│   +-- config.py              —   .halluccheck.toml parser, SFConfigBlock, SFServiceToggles, SFLocalFallbackConfig, validate_config() (Phase 9)
│   +-- registry.py            —   ServiceRegistry singleton, health checks, background checker, status_response() (Phase 9)
│   +-- fallback.py            —   8 local fallback implementations: pii, secrets, audit, observe, alert, identity, gate, cec (Phase 9)
│   +-- trust.py               —   SFTrustClient – T.R.U.S.T. five-pillar scorecard, SVG badge, history time-series, configurable weights (Phase 10)
│   +-- pipelines.py           —   5 HallucCheck pipeline integrations: score, bias, monitor, risk, benchmark (Phase 10)
│   +-- enterprise.py          —   SFEnterpriseClient – multi-tenancy, encryption, air-gap, health probes (Phase 11)
│   +-- security.py            —   SFSecurityClient – OWASP audit, STRIDE threat model, dependency/static scanning, secrets-in-logs (Phase 11)
│   +-- testing_mocks.py       —   11 mock service clients, _MockBase, mock_all_services() context manager (Phase 12)
│   +-- _base.py               —   SFClientConfig, SFServiceClient, circuit breaker, sandbox mode (Phase 12)
│   +-- _types.py              —   SecretStr, APIKeyBundle, JWTClaims, BundleResult, ClauseMapEntry, ExportResult, Annotation, AlertSeverity, …
│   +-- _exceptions.py         —   SFError hierarchy (incl. SFConfigError, SFConfigValidationError, SFStartupError, SFServiceUnavailableError, SFTrustComputeError, SFPipelineError)
│   +-- __init__.py            —   sf_identity / sf_pii / sf_secrets / sf_audit / sf_cec / sf_observe / sf_alert / sf_gate / sf_trust / sf_enterprise / sf_security / sf_rag / sf_feedback singletons + configure()
+-- migrate.py                 — Schema migration (v1 — v2), LangSmith migration

What is inside the box

ModuleWhat it doesFor whom
Compliance & Governance
spanforge.compliance ComplianceMappingEngine maps telemetry to regulatory frameworks (EU AI Act, ISO 42001, NIST AI RMF, GDPR, SOC 2, HIPAA). Generates evidence packages with HMAC-signed attestations. Consent, HITL, model registry, and explainability events integrated into clause mappings. Attestations include model owner, risk tier, status, warnings, and explanation_coverage_pct. Also: programmatic v2.0 compatibility checks — no pytest required. Compliance / legal / platform teams
spanforge.signing HMAC-SHA256 event signing, tamper-evident audit chains, key strength validation, key expiry checks, environment-isolated key derivation, multi-tenant KeyResolver protocol, and AsyncAuditStream Security / compliance teams
spanforge.redact PII detection, sensitivity levels, redaction policies, deep scan_payload() with Luhn / Verhoeff / SSN-range / date-calendar validation, built-in date_of_birth and address patterns, and contains_pii() / assert_redacted() with raw string scanning Data privacy / GDPR teams
spanforge.secrets SecretsScanner — 20-pattern registry (7 spec-defined + 13 industry-standard), Shannon entropy scoring, three-tier confidence model, zero-tolerance auto-block for 10 high-risk types, SecretsScanResult with to_dict() and SARIF 2.1.0 output, span deduplication, configurable allowlist Security / DevSecOps teams
spanforge.governance Policy-based event gating — block prohibited types, warn on deprecated usage, enforce custom rules Platform / compliance teams
Instrumentation & Tracing
spanforge.event The core Event envelope — the one structure all tools share Everyone
spanforge.types All built-in event types — compliance events (consent.*, hitl.*, model_registry.*, explanation.*) and telemetry events (llm.trace.*, llm.guard.*, etc.) Everyone
spanforge._span Span, AgentRun, AgentStep context managers. contextvars-based async/thread-safe propagation. async with, span.add_event(), span.set_timeout_deadline() App developers
spanforge._trace Trace + start_trace() — high-level tracing entry point; accumulates child spans App developers
spanforge.config configure() and get_config() — signing key, redaction policy, exporters, sample rate Everyone
Export & Integration
spanforge.export Ship events to JSONL, HTTP webhooks, OTLP collectors, Datadog APM, Grafana Loki, Splunk HEC, Syslog/CEF, Redis, or spanforge Cloud Infra / compliance teams
spanforge.export.siem_splunk SplunkHECExporter — thread-safe batched Splunk HTTP Event Collector exporter; env-var config; HEC token never logged; SplunkHECError on delivery failure Security / compliance teams
spanforge.export.siem_syslog SyslogExporter — RFC 5424 and ArcSight CEF exporter over UDP or TCP; severity derived from event type; CEF extension values properly escaped; SyslogExporterError on socket failure Security / compliance teams
spanforge.export.siem SIEMExporter — lightweight, network-free CEF v0 and IBM LEEF 2.0 string formatter; flattens envelope + payload fields into extension KV pairs; wired to spanforge export siem CLI Security / compliance teams
spanforge.stream Fan-out router — one drain() call reaches multiple backends; Kafka source Platform engineers
spanforge.integrations Auto-instrumentation for OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Groq, Ollama, Together; SpanForgeLangGraphCallback — 5-hook LangChain-compatible callback for LangGraph workflows App developers
spanforge.auto setup() auto-patches all installed LLM integrations; teardown() cleanly unpatches App developers
Developer Tools
spanforge.cost CostTracker, BudgetMonitor, @budget_alert — track and alert on token spend App developers / FinOps
spanforge.cache SemanticCache + @cached — deduplicate LLM calls via cosine similarity; InMemoryBackend, SQLiteBackend, RedisBackend App developers / FinOps
spanforge.retry @retry, FallbackChain, CircuitBreaker, CostAwareRouter — resilient LLM routing with compliance events App developers / SREs
spanforge.toolsmith @tool + ToolRegistry — register functions as typed tools; render JSON schemas for function-calling APIs App developers
spanforge.lint AST-based instrumentation linter; AO001–AO005 codes; flake8 plugin; CLI All teams / CI
Utilities (v2.0.2+)
spanforge.http chat_completion() — zero-dependency, synchronous OpenAI-compatible HTTP client with retry and back-off App developers
spanforge.io read_jsonl(), write_jsonl(), append_jsonl(), write_events(), read_events() — JSONL I/O utilities Everyone
spanforge.schema Lightweight zero-dependency JSON Schema validator — validate(), validate_strict() Tool authors / CI
spanforge.regression RegressionDetector — per-case pass/fail regression detection between baseline and current eval runs ML / eval teams
spanforge.stats percentile(), latency_summary() — statistical utilities for eval and performance analysis Analytics engineers
spanforge.plugins discover(group) — entry-point plugin discovery across Python 3.9–3.12+ Plugin authors
spanforge.presidio_backend Optional Presidio-powered PII detection backend — presidio_scan_payload() with standard PIIScanResult Data privacy teams
spanforge.eval Built-in scorers: FaithfulnessScorer, RefusalDetectionScorer, PIILeakageScorer, BehaviourScorer base class ML / eval teams
spanforge.debug print_tree(), summary(), visualize() — terminal tree, stats dict, HTML Gantt timeline App developers
spanforge.metrics aggregate() — success rates, latency percentiles, token totals, cost breakdowns Analytics engineers
spanforge.testing MockExporter, capture_events(), assert_event_schema_valid(), trace_store() Test authors
spanforge.testing_mocks 11 drop-in mock service clients (MockSFIdentity, MockSFPII, MockSFSecrets, MockSFAudit, MockSFObserve, MockSFGate, MockSFCEC, MockSFAlert, MockSFTrust, MockSFEnterprise, MockSFSecurity). mock_all_services() context manager patches all 11 singletons. _MockBase with .calls recording and .configure_response(). 100% test coverage. (Phase 12, v2.0.11+) Test authors / all teams
spanforge.validate JSON Schema validation against the published v2.0 schema All teams
spanforge.namespaces Typed payload dataclasses for all built-in event namespaces Tool authors
spanforge.models Optional Pydantic v2 models for validated schemas API / backend teams
spanforge.consumer Declare schema-namespace dependencies; fail fast at startup if version requirements are not met Platform teams
spanforge.deprecations Per-event-type deprecation notices at runtime Library maintainers
spanforge._hooks Lifecycle hooks: @hooks.on_llm_call, @hooks.on_tool_call, @hooks.on_agent_start (sync + async) App developers / platform
spanforge._store TraceStore ring buffer — get_trace(), list_tool_calls(), list_llm_calls() Platform / tooling engineers
spanforge._cli CLI sub-commands including eval, compliance status, migrate-langsmith, cost run, and more DevOps / CI teams
Service SDK (v2.0.3+)
spanforge.sdk.identity SFIdentityClient — API key lifecycle (issue_api_key, rotate_api_key, revoke_api_key), session JWT (HS256 stdlib / RS256 remote), magic-link issuance + single-use exchange, TOTP enrolment + verification (RFC 6238, 6-digit, 30 s), backup codes, per-key IP allowlist, sliding-window rate limiting, brute-force lockout. Fully local-mode capable — no external service required. Security / platform teams
spanforge.sdk.pii SFPIIClientscan_text(), anonymise(), scan_batch(), apply_pipeline_action(), get_status(), erase_subject() (GDPR Art. 17), export_subject_data() (CCPA DSAR), safe_harbor_deidentify() (HIPAA 18-PHI), audit_training_data() (EU AI Act Art. 10), get_pii_stats(). PIPL patterns for Chinese national ID / mobile / bank card. Pipeline action routing (flag / redact / block) with confidence threshold gate. Scan results never include raw PII — only type labels, field paths, and SHA-256 hashes. Runs locally or delegates to a remote sf-pii service. Data privacy / GDPR teams
spanforge.sdk.secrets SFSecretsClientscan(text)SecretsScanResult, scan_batch(texts) with asyncio parallel execution. 20-pattern registry covering all spec-required types plus 13 industry-standard additions. Three-tier confidence model (0.75 / 0.90 / 0.97). Zero-tolerance auto-block for 10 high-risk secret types. SARIF 2.1.0 output. Runs fully locally — no external service required. Security / DevSecOps teams
spanforge.sdk.audit SFAuditClientappend(record, schema_key) with HMAC-SHA256 chaining, query() SQLite index with full-text and date-range filters, verify_chain() tamper detection, get_trust_scorecard() T.R.U.S.T. dimensions (hallucination · PII hygiene · secrets hygiene · gate pass-rate · compliance posture), generate_article30_record() GDPR Article 30 RoPA, export() JSONL/CSV/compressed, sign(), get_status(). BYOS routing via SPANFORGE_AUDIT_BYOS_PROVIDER (S3 / Azure / GCS / R2). Strict-schema mode, configurable retention years, optional SQLite persistence. 123 tests, 85 % coverage, mypy strict clean. Compliance / security / audit teams
spanforge.sdk.observe SFObserveClientemit_span(name, attributes) builds OTel-compliant spans with W3C traceparent / baggage injection and OTel GenAI semantic attributes; export_spans(spans, receiver_config=...) routes to local / otlp / datadog / grafana / splunk / elastic; add_annotation(event_type, payload) / get_annotations(event_type, from_dt, to_dt) annotation store; get_status(), healthy, last_export_at health probes. Sampling via SPANFORGE_OBSERVE_SAMPLER (always_on / always_off / parent_based / trace_id_ratio). 139 tests, 97% coverage, mypy strict + bandit clean. (Phase 6, v2.0.5+) Platform / MLOps / observability teams
spanforge.sdk.alert SFAlertClientpublish(topic, payload, *, severity, project_id) → PublishResult routes to all configured sinks with deduplication, rate-limiting, alert grouping, and maintenance-window suppression; acknowledge(alert_id) cancels CRITICAL escalation; register_topic() custom topic registry; set_maintenance_window() / remove_maintenance_windows(); get_alert_history() with filtering; get_status() / healthy health probes. Built-in sinks: WebhookAlerter (HMAC), OpsGenieAlerter, VictorOpsAlerter, IncidentIOAlerter, SMSAlerter (Twilio), TeamsAdaptiveCardAlerter. Auto-discovery from SPANFORGE_ALERT_* env vars. Per-sink circuit breakers. 95 tests, mypy strict + bandit clean. (Phase 7, v2.0.6+) Platform / SRE / on-call teams
spanforge.sdk.gate SFGateClientevaluate(gate_id, payload) → GateEvaluationResult, evaluate_prri(project_id, *, prri_score) → PRRIResult, run_pipeline(gate_config_path) → GateRunResult, get_artifact(gate_id), list_artifacts(), purge_artifacts(older_than_days), get_status() → GateStatusInfo, configure(config). Six built-in gate executors: schema_validation, dependency_security, secrets_scan, performance_regression, halluccheck_prri, halluccheck_trust. PRRI three-tier verdict (GREEN/AMBER/RED), GateArtifact store with configurable retention, composite trust gate (HRI rate + PII window + secrets window), five exception types. 174 tests, mypy strict + bandit clean. (Phase 8, v2.0.7+) DevOps / CI / platform teams
spanforge.sdk.config load_config_file(path?) — auto-discovers .halluccheck.toml or falls back to env-var defaults. validate_config(block) / validate_config_strict(block) schema validation. SFConfigBlock, SFServiceToggles, SFLocalFallbackConfig, SFPIIConfig, SFSecretsConfig typed dataclasses. Env-var overrides: SPANFORGE_ENDPOINT, SPANFORGE_API_KEY, SPANFORGE_PROJECT_ID, SPANFORGE_PII_THRESHOLD, SPANFORGE_SECRETS_AUTO_BLOCK, SPANFORGE_LOCAL_TOKEN, SPANFORGE_FALLBACK_TIMEOUT_MS. (Phase 9, v2.0.8+) All teams / platform engineers
spanforge.sdk.registry ServiceRegistry.get_instance() — thread-safe singleton holding all 11 service clients. run_startup_check() pings all enabled services (status: up / degraded / down). status_response() returns per-service {status, latency_ms, last_checked_at}. start_background_checker() launches a daemon thread re-checking every 60 s. ServiceHealth, ServiceStatus typed enums. (Phase 9, v2.0.8+) Platform / SRE teams
spanforge.sdk.fallback 8 local-mode fallback implementations: pii_fallback() (regex scan), secrets_fallback() (regex scan), audit_fallback() (HMAC-chained JSONL), observe_fallback() (OTLP JSON to stdout), alert_fallback() (log to stderr), identity_fallback() (trust local token), gate_fallback() (local gate engine), cec_fallback() (local JSONL). All emit WARNING when active. (Phase 9, v2.0.8+) All teams (automatic)
spanforge.sdk.trust SFTrustClientget_scorecard(project_id, *, from_dt, to_dt, weights) → TrustScorecardResponse aggregates five T.R.U.S.T. dimensions (Transparency · Reliability · UserTrust · Security · Traceability) with configurable weights. get_badge(project_id) → TrustBadgeResult generates an SVG badge with colour-band (green ≥ 80, amber ≥ 60, red < 60). get_history(project_id, *, buckets) → list[TrustHistoryEntry] returns time-series snapshots. get_status() health probe. Reads from sf-audit trust records. 28 tests, mypy strict + bandit clean. (Phase 10, v2.0.9+) Compliance / platform / ML teams
spanforge.sdk.pipelines 5 HallucCheck ↔ SpanForge pipeline integrations: score_pipeline(text) (PII → secrets → observe → audit), bias_pipeline(bias_report) (PII → audit → alert → anonymise), monitor_pipeline(event) (annotate → alert → OTel export), risk_pipeline(prri_record) (audit → alert if RED → optional gate → optional CEC), benchmark_pipeline(run_result) (audit → F1 regression alert → anonymise). Each returns PipelineResult with audit trail. (Phase 10, v2.0.9+) ML / eval / platform teams
SFCECClientbuild_bundle(project_id, date_range, frameworks) assembles a signed ZIP with manifest.json, clause_map.json, chain_proof.json, attestation.json, rfc3161_timestamp.tsr, and 6 NDJSON evidence directories. HMAC-SHA256 manifest signing, BYOS detection. verify_bundle(zip_path) re-verifies HMAC + chain + timestamp. generate_dpa(project_id, controller_details, processor_details) produces a GDPR Article 28 Data Processing Agreement. get_status() returns bundle count, BYOS provider, and last bundle timestamp. Supports all 5 frameworks: eu_ai_act, iso_42001, nist_ai_rmf, iso27001, soc2. 148 tests, 87% coverage, mypy strict + bandit clean. (Phase 5, v2.0.4+) Compliance / legal / audit teams
spanforge.sdk Pre-built sf_identity, sf_pii, sf_secrets, sf_audit, sf_cec, sf_observe, sf_alert, sf_gate, sf_trust, sf_enterprise, sf_security, sf_rag, and sf_feedback singletons loaded from env vars on first import. SFClientConfig, SecretStr, full exception hierarchy (SFAuthError, SFBruteForceLockedError, SFPIINotRedactedError, SFPIIBlockedError, SFPIIDPDPConsentMissingError, SFSecretsBlockedError, SFAuditSchemaError, SFAuditAppendError, SFAuditQueryError, SFCECError, SFCECBuildError, SFCECVerifyError, SFCECExportError, SFObserveError, SFObserveExportError, SFObserveEmitError, SFObserveAnnotationError, SFAlertError, SFAlertPublishError, SFAlertRateLimitedError, SFAlertQueueFullError, SFGateError, SFGateEvaluationError, SFGatePipelineError, SFGateTrustFailedError, SFGateSchemaError, SFConfigError, SFConfigValidationError, SFStartupError, SFServiceUnavailableError, SFTrustComputeError, SFPipelineError, SFEnterpriseError, SFIsolationError, SFDataResidencyError, SFEncryptionError, SFFIPSError, SFAirGapError, SFSecurityScanError, SFSecretsInLogsError, …), and all value-object types exported from the top-level package. load_config_file(), validate_config(), validate_config_strict(), ServiceRegistry, and 8 fallback functions re-exported for convenience. All teams

Quality

  • 7 049 tests passing (7 skipped) — unit, integration, property-based (Hypothesis), performance benchmarks
  • ≥ 91% line and branch coverage — 90% minimum enforced in CI
  • Zero required dependencies — entire core runs on Python stdlib
  • Typed — full py.typed marker; mypy + pyright clean
  • Frozen v2 trace schemallm.trace.* payload fields never break between minor releases
  • Async-safecontextvars-based context propagation across asyncio, threads, and executors

Development

git clone https://github.com/veerarag1973/spanforge.git
cd spanforge
python -m venv .venv && .venv\Scripts\activate
pip install -e ".[dev]"
pytest                      # 7 049 tests
Code quality
ruff check . && ruff format .
mypy spanforge
pytest --cov                # >=90% required
Build docs
pip install -e ".[docs]"
cd docs && sphinx-build -b html . _build/html

Versioning

spanforge implements RFC-0001 (AI Compliance Standard for Agentic AI Systems). Current schema version: 2.0.

This project follows Semantic Versioning. The llm.trace.* namespace is additionally frozen at v2 — even major releases won't remove fields from SpanPayload, AgentRunPayload, or AgentStepPayload.

See docs/changelog.md for the full version history.


Contributing

Contributions welcome — see the Contributing Guide. All new code must maintain ≥ 90% coverage. Run ruff and mypy before submitting.


Community

Topics: ai-compliance ai-governance eu-ai-act gdpr soc2 audit-trail pii-redaction hmac-signing llm-governance python


License

MIT License

  • ✅ Free for any use — personal, research, education, open-source, and commercial.
  • ✅ No restrictions on distribution, modification, or sublicensing.

Enterprise features (SSO, air-gapped deployment, dedicated support, SLAs) are available in SpanForge Enterprise — contact sriram@getspanforge.com | getspanforge.com/pricing.


Built for teams that take AI governance seriously.
DocsRuntime GovernanceQuickstartAPI ReferenceDiscussionsReport a bug

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spanforge-1.0.3.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spanforge-1.0.3-py3-none-any.whl (779.4 kB view details)

Uploaded Python 3

File details

Details for the file spanforge-1.0.3.tar.gz.

File metadata

  • Download URL: spanforge-1.0.3.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge-1.0.3.tar.gz
Algorithm Hash digest
SHA256 8bc4e5a2c12b1d077245e4561b325e187514e268be69ba63f56abdbfa2060150
MD5 701e6a979f5619a74ea9daa526fcbddf
BLAKE2b-256 547925e936e6ba6d84afafa153872b1c95527b5cf19a9719e73732203885235f

See more details on using hashes here.

File details

Details for the file spanforge-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: spanforge-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 779.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for spanforge-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 049cee9da4d5e86d112856526532d4e08b11114ffe13e76581aabc8bb57aea34
MD5 4092ce3b844382fadf64278ea11130c4
BLAKE2b-256 9fdfa788ec15e8b9f21ef0257c670047dc0bb1921a59655f4ef61a62c5125753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page