Skip to main content

CroviaTrust – Open-core evidence & payout engine for AI datasets

Project description

Crovia Spider

“If it’s already in the open training datasets of 2025–2026, it already has a Crovia receipt.”

Crovia Spider turns existing open training corpora (e.g. LAION, C4, The Stack) into standardized spider_receipt.v1 NDJSON logs.

A spider_receipt.v1 is the minimal unit of Crovia's awareness of a content item in the training data ecosystem.

This repository contains:

  • the formal spec: docs/CROVIA_SPIDER_RECEIPT_v1.md
  • a reference implementation to generate receipts from LAION-style metadata
  • a CLI: crovia-spider from-laion ...

Quick start

Clone and install:

git clone https://github.com/croviatrust/crovia-spider.git
cd crovia-spider

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

pip install -U pip
pip install .

crovia-spider --help

Example (LAION-style Parquet → spider_receipt NDJSON):

crovia-spider from-laion \
  --metadata-path /data/laion/laion2B-en-meta.parquet \
  --out data/receipts_laion_sample.ndjson \
  --period 2025-12 \
  --sample 100000

This will produce an NDJSON file where each line is a valid spider_receipt.v1.

See docs/CROVIA_SPIDER_RECEIPT_v1.md for the full specification.


CROVIA Spider – Real evidence runs

HEAD

cd /opt/crovia
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2.2 Full period run

python run_period.py \
  --period 2025-11 \
  --eur-total 1000000 \
  --receipts data/royalty_from_faiss.ndjson \
  --min-appear 1

This will:

  • Run QA
  • Compute trust metrics
  • (Optional) run schema validation (crovia_validate.py)
  • (Optional) run AI-Act helpers (compliance_ai_act.py)
  • Compute payouts
  • Generate charts
  • Build & verify a hash-chain
  • Compute Crovian Floors
  • (Optional) build a Trust Bundle

Artifacts generated:

data/trust_providers.csv
docs/trust_summary.md
data/payouts_2025-11.csv
data/payouts_2025-11.ndjson
data/floors_2025-11.json
charts/payout_top10_2025-11.png
charts/payout_cumulative_2025-11.png
proofs/hashchain_*.txt

For a detailed operator view, see:

docs/CROVIA_OPEN_CORE_FAISS_2025-11_OVERVIEW.md

2.3 Running the entire demo via C-LINE (recommended)

C-LINE is the unified command-line interface for the CROVIA Core Engine. It wraps all internal scripts (validation, trust, payouts, floors, hash-chain, AI Act helpers, ZIP evidence builder) into a single, user-friendly CLI.

Run the full 2025-11 demo with one command:

python tools/c_line.py demo
# future installation:
#   c-line demo

This will automatically:

  • validate the receipts NDJSON
  • run trust aggregation
  • compute payouts (payouts.v1)
  • generate payout charts
  • compute Crovian Floors v1.1
  • run AI Act Annex-IV documentation helpers
  • write a SHA-256 hash-chain and verify it
  • collect all artifacts into a ZIP evidence pack
  • generate a QR code pointing to the pack

Artifacts produced:

evidence/CROVIA_evidence_2025-11.zip
proofs/QR_evidence_2025-11.png
docs/VALIDATE_report_2025-11.md
docs/AI_ACT_summary_2025-11.md
data/payouts_2025-11.csv
data/floors_2025-11.json
# plus ~30 additional files: charts, logs, packs, proofs

C-LINE v1.0 turns the CROVIA demo into a single-shot reproducible evidence pipeline.

2.4 Install as a CLI (C-LINE)

You can also install the CROVIA Core Engine as a local CLI inside a virtualenv:

pip install -e .
c-line demo

This will install the c-line entrypoint in your environment and run the full CROVIA demo pipeline (validation → trust → payouts → floors → hash-chain → AI Act helpers → ZIP + QR evidence pack) with a single command.

3. DPI demo – Trust Bundle example

The repository includes a small DPI demo showing:

  • DPI-based royalty_receipt.v1 logs
  • payouts, floors, trust CSVs
  • AI Act-style documentation
  • SHA-256 evidence digests consolidated in trust_bundle.v1

Validate the bundle:

python trust_bundle_validator.py \
  --bundle demo_dpi_2025-11/output/trust_bundle_2025-11.json \
  --base-dir /opt/crovia

Expected output (abridged):

[*] Loading bundle: .../trust_bundle_2025-11.json
schema=crovia_trust_bundle.v1  period=2025-11

=== Artifact verification ===
[RESULT] Bundle OK: all declared artifacts match size and sha256.

A Trust Bundle acts as a hash-addressable evidence pack for auditors, regulators and partners.

Full profile:

docs/CROVIA_TRUST_BUNDLE_v1.md

4. Validation, AI Act & Evidence Tools

The open-core engine includes transparent validation and compliance modules for auditors, researchers and model-card workflows.

4.1 Schema & QA Validation — crovia_validate.py

Validates royalty_receipt.v1 NDJSON files:

  • schema correctness
  • share ≈ 1.0 checks
  • rank ordering
  • malformed / suspicious rows

Produces:

  • validation report (Markdown)
  • sample failing rows

Example:

python crovia_validate.py data/royalty.ndjson \
  --out-md docs/VALIDATE_report.md \
  --out-bad data/royalty_bad_sample.ndjson

Outputs:

docs/VALIDATE_*.md
data/*_bad_sample.ndjson

4.2 AI Act Annex IV Helpers — compliance_ai_act.py

Generates lightweight Annex-IV-style documentation:

  • provider & shard distribution
  • provenance hints
  • concentration & risk signals
  • gaps file (*_gaps.ndjson)
  • JSON compliance pack

Run:

python compliance_ai_act.py data/royalty.ndjson \
  --out-summary docs/AI_ACT_summary.md \
  --out-gaps data/AI_ACT_gaps.ndjson \
  --out-pack data/AI_ACT_pack.json

4.3 CCL Validation — tools/ccl_validate.py

Validates CCL v1.1 JSON descriptors for:

  • AI models
  • datasets
  • RAG indices
  • APIs / tools

Run:

python tools/ccl_validate.py my_model.ccl.json

Full CCL spec:

docs/CROVIA_CCL_v1.1.md

4.4 CEP Evidence Protocol v1 — crovia_generate_cep.py

CROVIA CEP.v1 is a compact, verifiable evidence block for:

  • Hugging Face model cards
  • research papers
  • audit packs
  • trust bundle metadata

Generated via:

python tools/crovia_generate_cep.py \
  --trust-bundle trust_bundle.json \
  --period 2025-11 \
  --receipts data/royalty.ndjson \
  --payouts data/payouts.csv \
  --hashchain proofs/hashchain_*.txt \
  --engine-version demo-2025 \
  --output-format yaml

The result includes:

  • SHA-256 of receipts / payouts / bundle
  • hash-chain root
  • trust metrics (avg_top1_share, DP epsilon range, CI indicators)
  • generation metadata

Full spec:

docs/CROVIA_CEP_v1.md

5. Status & scope

This repository is:

  • Open-core — attribution → trust → payouts → floors → proofs
  • Demo-grade — synthetic data
  • Evidence-first — built for transparency, auditability, reproducibility

Business logic, contracts, billing, CCT-attested tokens and settlement overrides live in the private PRO engine, not here.

6. Licensing

Apache License 2.0

Permitted:

  • commercial or academic usage
  • modification and redistribution
  • closed or open derivatives
  • integration into external pipelines

See the LICENSE file.

7. Copyright

© 2025 — Tarik En Nakhai Crovia Core Engine

This repository includes a NOTICE file (Apache-2.0 requirement).

8. Contact

info@croviatrust.com

https://croviatrust.com

License: Apache-2.0

394d7a4 (Spider: add real GSM8K evidence run (spider_receipt.v1))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crovia-0.1.0.tar.gz (64.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crovia-0.1.0-py3-none-any.whl (75.4 kB view details)

Uploaded Python 3

File details

Details for the file crovia-0.1.0.tar.gz.

File metadata

  • Download URL: crovia-0.1.0.tar.gz
  • Upload date:
  • Size: 64.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crovia-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ee3749c47b05bd78aa46db4837d312460245b5082f3b62a9c2ec8637490ca396
MD5 fc97208e7494189ee49bae175fe19353
BLAKE2b-256 3a21d336769e7b091892c5a6864c9083a53a999e27897d4254fd43e3ea43ce34

See more details on using hashes here.

File details

Details for the file crovia-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crovia-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 75.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for crovia-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 95562293b63515fb4e40f1cfbd7146ba573dc319c0072f5d117100ff58991b58
MD5 683d0a64d34fb32fa817fc51f806fb2d
BLAKE2b-256 a8e19fc6fe0aab04ff9713ba23dcf2d6cc4323929a45ae1f7487160aa0b54b22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page