Skip to main content

PQC-native C2PA-compatible content provenance for AI-generated outputs. Sign every LLM/image/audio output with ML-DSA so its origin is verifiable for decades.

Project description

PQC Signed AI Content Provenance

PQC Native ML-DSA C2PA-Compatible License Version

C2PA for AI outputs, signed with ML-DSA. Every piece of AI-generated content (text, image, audio) gets a signed provenance manifest that cryptographically proves which model produced it, when, from what prompt, and under what licensing terms. Unlike classical C2PA, signatures use ML-DSA (FIPS 204) so they survive the quantum transition: audit trails signed today remain verifiable 20+ years from now, even against a future quantum adversary.

The Problem

Classical C2PA manifests rely on ECDSA / RSA signatures. A sufficiently large quantum computer running Shor's algorithm breaks both. That means every AI-generated article, diagnostic, or trading recommendation you sign today becomes retroactively forgeable once CRQCs (cryptographically-relevant quantum computers) arrive. Industries with long audit horizons (healthcare: 10-30 years, finance: 7+ years, legal discovery: indefinite) cannot rely on a classical signature for provenance.

The Solution

Every AI output is wrapped in a signed ContentManifest:

  • SHA3-256 content hash binds the manifest to the exact bytes produced.
  • ModelAttribution names the model, version, and Shield Registry manifest hash.
  • GenerationContext records prompt hash, parameters, and timestamp.
  • Assertions — pluggable C2PA-style claims (AI-generated, training summary, usage license).
  • ML-DSA signature over the canonical digest, by the model's AgentIdentity DID.
  • Provenance chain links derivations (AI draft -> human edit -> final) so every change has an auditable signer.

At any future date, a verifier recomputes the content hash, re-runs ML-DSA verify on the canonical manifest bytes, and walks the chain. Tampering at any layer is detected.

Installation

pip install pqc-content-provenance

Development:

pip install -e ".[dev]"

Quick Start

Sign an AI output

from quantumshield import AgentIdentity
from pqc_content_provenance import (
    AIGeneratedAssertion,
    ContentManifest,
    GenerationContext,
    ManifestSigner,
    ModelAttribution,
    UsageAssertion,
    embed_manifest,
)

identity = AgentIdentity.create("llama-3-signer")
signer = ManifestSigner(identity)

content = b"AI-generated press release about tool #4."

manifest = ContentManifest.create(
    content=content,
    content_type="text/plain",
    model_attribution=ModelAttribution(
        model_did=identity.did,
        model_name="Llama-3-8B-Instruct",
        model_version="1.0",
        registry_url="https://quantamrkt.com/models/meta-llama-Llama-3-8B-Instruct",
    ),
    generation_context=GenerationContext(
        prompt_hash="ab" * 32,
        parameters={"temperature": 0.7},
        generated_at="2026-04-20T12:00:00Z",
    ),
    assertions=[
        AIGeneratedAssertion(model_name="Llama-3-8B-Instruct", model_version="1.0"),
        UsageAssertion(license="cc-by-4.0", commercial_use=True, attribution_required=True),
    ],
)

signed = signer.sign(manifest)
envelope = embed_manifest(content, signed, mode="sidecar")

# Persist envelope alongside the content -- e.g. output.txt + output.txt.c2pa.json

Verify an AI output

from pqc_content_provenance import extract_manifest, ManifestSigner

manifest, content = extract_manifest(envelope, mode="sidecar")
result = ManifestSigner.verify(manifest, content)

if not result.valid:
    raise RuntimeError(f"provenance check failed: {result.error}")

print(f"valid output from {result.signer_did}")

Architecture

  AI Model                Publisher                Consumer / Auditor
  --------                ---------                ------------------
     |                        |                            |
     | 1. generate output     |                            |
     |                        |                            |
     | 2. ContentManifest.create:                          |
     |    - SHA3-256 content hash                          |
     |    - model attribution (from Shield Registry)       |
     |    - generation context (prompt, params, time)      |
     |    - assertions (AI-generated, usage, training)     |
     |                        |                            |
     | 3. ManifestSigner.sign:                             |
     |    - canonical JSON  -> SHA3-256                    |
     |    - ML-DSA signature with AgentIdentity            |
     |                        |                            |
     | 4. embed_manifest  --->| 5. store content + sidecar |
     |    (sidecar or inline) |    in CMS / DB / S3        |
     |                        |                            |
                              | 6. deliver envelope ------>|
                              |                            |
                                                           | 7. extract_manifest
                                                           | 8. ManifestSigner.verify:
                                                           |    - recompute content hash
                                                           |    - ML-DSA verify canonical
                                                           |    - walk ProvenanceChain
                                                           |
                                                           | 9. reject on any mismatch

Threat Model

Threat Mitigation
Forged attribution (claim output came from model X when it didn't) Manifest ML-DSA signature only verifies against model X's AgentIdentity public key.
Content tampering (text/image modified after signing) Recomputed SHA3-256 no longer matches manifest.content_hash.
Manifest tampering (edit claimed model/prompt/license) ML-DSA signature over canonical bytes breaks as soon as any field changes.
Lost chain of custody (edits with no signer record) ProvenanceChain enforces previous_manifest_id links; each link has its own signer.
Re-used signature across outputs Signature is over the canonical bytes of this specific manifest, which includes content_hash and manifest_id.
Unknown / unregistered assertion ASSERTION_REGISTRY rejects unknown labels with UnknownAssertionError.
Quantum adversary (Shor's algorithm) ML-DSA (FIPS 204) is not broken by known quantum attacks.
Long audit horizon (10-30 year retention) Post-quantum signatures remain verifiable past classical crypto's expiry.

Assertions

Pluggable facts attached to a manifest. Each is a dataclass with a label that matches a C2PA-style namespace.

AIGeneratedAssertionc2pa.ai_generated

Field Description
model_name, model_version, model_did Which model produced the content
generator_type text / image / audio / video / multimodal
human_edited Was it post-edited by a human?
generation_params Temperature, top_p, seed, etc.

TrainingAssertionc2pa.training

Field Description
dataset_name, dataset_root_hash Source training set + Merkle root
fine_tune_dataset, fine_tune_root_hash Optional fine-tune set
pii_filtered, copyright_cleared Compliance flags
licenses SPDX identifiers, e.g. ["cc-by-4.0", "apache-2.0"]

UsageAssertionc2pa.usage

Field Description
license SPDX identifier or custom string
commercial_use, attribution_required Rights flags
attribution_text Required credit text
jurisdictions Country codes where valid
expiry ISO-8601 expiry or empty

Register your own assertion subclass by adding it to ASSERTION_REGISTRY with its label.

Chain of Custody

Every derivation (AI draft -> human edit -> legal review) produces a new manifest that references the previous via previous_manifest_id. The ProvenanceChain verifies:

  1. Each manifest's ML-DSA signature.
  2. Each manifest's previous_manifest_id matches the prior link's manifest_id.
  3. The whole chain round-trips through to_dicts() / from_dicts() without loss.
chain = ProvenanceChain()
chain.add(ai_draft_signed)          # signed by model identity
chain.add(human_edit_signed)         # signed by editor identity, prev = ai_draft.manifest_id
chain.add(legal_review_signed)       # signed by legal identity, prev = human_edit.manifest_id

ok, errors = chain.verify_chain()

API Reference

ContentManifest

Method Description
ContentManifest.create(content, content_type, attribution, context, assertions=..., previous_manifest_id=...) Build an unsigned manifest
ContentManifest.compute_content_hash(bytes) Static SHA3-256 helper
canonical_bytes() Deterministic bytes used for signing
to_dict() / to_json() / from_dict() / from_json() JSON-safe round-trip

ModelAttribution / GenerationContext

Plain dataclasses holding model identity + generation context. Fully JSON-round-trippable.

ManifestSigner

Method Description
ManifestSigner(identity) Bind a signer to an AgentIdentity
sign(manifest) In-place sign; returns manifest
sign_and_raise_on_mismatch(manifest, content) Defensive: re-check content hash before signing
ManifestSigner.verify(manifest, content=None) Static — returns VerificationResult

VerificationResult

Frozen dataclass. Fields: valid, manifest_id, signer_did, algorithm, content_hash_match, signature_match, error.

ProvenanceChain / ProvenanceLink

Method Description
add(manifest) Append link; raises ChainBrokenError on bad previous_manifest_id
verify_chain() Returns (ok, errors) — verifies every signature and every link
to_dicts() / from_dicts(items) JSON-safe round-trip

embed_manifest / extract_manifest

Mode Description
sidecar JSON envelope containing manifest + base64 content. Save to .c2pa.json.
text-header Inline marker block prepended to text content.

Exceptions

Exception When
ProvenanceError Base class
InvalidManifestError Malformed manifest / missing fields / bad JSON
SignatureVerificationError Base for signature check failures
ContentHashMismatchError Content bytes don't match manifest's claimed hash
ChainBrokenError Provenance chain link mismatch
UnknownAssertionError Assertion label not in ASSERTION_REGISTRY

Examples

See the examples/ directory:

  • sign_llm_output.py — end-to-end: agent signs AI text, embeds into sidecar, extracts, verifies.
  • detect_tampered_output.py — shows that modifying the content bytes after signing is detected.
  • provenance_chain.py — AI draft -> human-edited derivation; each link signed by a different identity.

Run them:

python examples/sign_llm_output.py
python examples/detect_tampered_output.py
python examples/provenance_chain.py

Why PQC Matters for Provenance

Provenance is fundamentally an audit-trail technology: its whole value is being verifiable later. "Later" for healthcare is decades; for financial audits, years; for legal discovery, possibly forever. Classical signatures are vulnerable to Harvest-Now-Decrypt-Later (HNDL) style retroactive forgery — an adversary who records today's signed outputs can, once quantum-capable, produce indistinguishable fake manifests that appear to have been signed in the past. ML-DSA (FIPS 204) is believed to resist this attack. Signing AI outputs with PQC today is how we guarantee that tomorrow's auditors can still trust yesterday's provenance.

Development

pip install -e ".[dev]"
pytest
ruff check src/ tests/ examples/

Related

Part of the QuantaMrkt post-quantum tooling registry. See also:

  • QuantumShield — the PQC toolkit (AgentIdentity, SignatureAlgorithm, sign/verify).
  • PQC RAG Signing — sister tool for signing RAG pipeline chunks with ML-DSA.
  • PQC MCP Transport — sister tool for PQC-secured Model Context Protocol transports.

License

Apache License 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqc_content_provenance-0.1.0.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pqc_content_provenance-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file pqc_content_provenance-0.1.0.tar.gz.

File metadata

  • Download URL: pqc_content_provenance-0.1.0.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pqc_content_provenance-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd3cec2b90dc234ff6b9b93b143bc87f10bdd4506d7e4a02702d918102be3427
MD5 6b47189b8bfdceb23532abf8628ea96a
BLAKE2b-256 a714853c2bef7d9224d6a82cfcb700bf95d6484c77338ee0754aa549e9587475

See more details on using hashes here.

File details

Details for the file pqc_content_provenance-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pqc_content_provenance-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b8ff69d58e465c991565d73f63e0c0466e576895fdd3d15c0432cf3e2f96869
MD5 8712124dec454892b4b59b3b0a39e2b2
BLAKE2b-256 3f5b0499acb93f4216d157349c0a912744ce98d5bed14a40a8548c20bf2fc78b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page