CroviaTrust – Open-core evidence & payout engine for AI datasets
Project description
Crovia Spider
“If it’s already in the open training datasets of 2025–2026, it already has a Crovia receipt.”
Crovia Spider turns existing open training corpora (e.g. LAION, C4, The Stack) into
standardized spider_receipt.v1 NDJSON logs.
A spider_receipt.v1 is the minimal unit of Crovia's awareness of a content item
in the training data ecosystem.
This repository contains:
- the formal spec:
docs/CROVIA_SPIDER_RECEIPT_v1.md - a reference implementation to generate receipts from LAION-style metadata
- a CLI:
crovia-spider from-laion ...
Quick start
Clone and install:
git clone https://github.com/croviatrust/crovia-spider.git
cd crovia-spider
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -U pip
pip install .
crovia-spider --help
Example (LAION-style Parquet → spider_receipt NDJSON):
crovia-spider from-laion \
--metadata-path /data/laion/laion2B-en-meta.parquet \
--out data/receipts_laion_sample.ndjson \
--period 2025-12 \
--sample 100000
This will produce an NDJSON file where each line is a valid spider_receipt.v1.
See docs/CROVIA_SPIDER_RECEIPT_v1.md for the full specification.
CROVIA Spider – Real evidence runs
- GSM8K (OpenAI math word problems, HF mirror
oieieio/gsm8k)- Period: 2025-12
- Receipts: 7,473
spider_receipt.v1entries - Docs: docs/README_SPIDER_GSM8K_2025-12.md
HEAD
cd /opt/crovia
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
2.2 Full period run
python run_period.py \
--period 2025-11 \
--eur-total 1000000 \
--receipts data/royalty_from_faiss.ndjson \
--min-appear 1
This will:
- Run QA
- Compute trust metrics
- (Optional) run schema validation (
crovia_validate.py) - (Optional) run AI-Act helpers (
compliance_ai_act.py) - Compute payouts
- Generate charts
- Build & verify a hash-chain
- Compute Crovian Floors
- (Optional) build a Trust Bundle
Artifacts generated:
data/trust_providers.csv
docs/trust_summary.md
data/payouts_2025-11.csv
data/payouts_2025-11.ndjson
data/floors_2025-11.json
charts/payout_top10_2025-11.png
charts/payout_cumulative_2025-11.png
proofs/hashchain_*.txt
For a detailed operator view, see:
docs/CROVIA_OPEN_CORE_FAISS_2025-11_OVERVIEW.md
2.3 Running the entire demo via C-LINE (recommended)
C-LINE is the unified command-line interface for the CROVIA Core Engine. It wraps all internal scripts (validation, trust, payouts, floors, hash-chain, AI Act helpers, ZIP evidence builder) into a single, user-friendly CLI.
Run the full 2025-11 demo with one command:
python tools/c_line.py demo
# future installation:
# c-line demo
This will automatically:
- validate the receipts NDJSON
- run trust aggregation
- compute payouts (
payouts.v1) - generate payout charts
- compute Crovian Floors v1.1
- run AI Act Annex-IV documentation helpers
- write a SHA-256 hash-chain and verify it
- collect all artifacts into a ZIP evidence pack
- generate a QR code pointing to the pack
Artifacts produced:
evidence/CROVIA_evidence_2025-11.zip
proofs/QR_evidence_2025-11.png
docs/VALIDATE_report_2025-11.md
docs/AI_ACT_summary_2025-11.md
data/payouts_2025-11.csv
data/floors_2025-11.json
# plus ~30 additional files: charts, logs, packs, proofs
C-LINE v1.0 turns the CROVIA demo into a single-shot reproducible evidence pipeline.
2.4 Install as a CLI (C-LINE)
You can also install the CROVIA Core Engine as a local CLI inside a virtualenv:
pip install -e .
c-line demo
This will install the c-line entrypoint in your environment and run the full CROVIA demo pipeline (validation → trust → payouts → floors → hash-chain → AI Act helpers → ZIP + QR evidence pack) with a single command.
3. DPI demo – Trust Bundle example
The repository includes a small DPI demo showing:
- DPI-based
royalty_receipt.v1logs - payouts, floors, trust CSVs
- AI Act-style documentation
- SHA-256 evidence digests consolidated in
trust_bundle.v1
Validate the bundle:
python trust_bundle_validator.py \
--bundle demo_dpi_2025-11/output/trust_bundle_2025-11.json \
--base-dir /opt/crovia
Expected output (abridged):
[*] Loading bundle: .../trust_bundle_2025-11.json
schema=crovia_trust_bundle.v1 period=2025-11
=== Artifact verification ===
[RESULT] Bundle OK: all declared artifacts match size and sha256.
A Trust Bundle acts as a hash-addressable evidence pack for auditors, regulators and partners.
Full profile:
docs/CROVIA_TRUST_BUNDLE_v1.md
4. Validation, AI Act & Evidence Tools
The open-core engine includes transparent validation and compliance modules for auditors, researchers and model-card workflows.
4.1 Schema & QA Validation — crovia_validate.py
Validates royalty_receipt.v1 NDJSON files:
- schema correctness
- share ≈ 1.0 checks
- rank ordering
- malformed / suspicious rows
Produces:
- validation report (Markdown)
- sample failing rows
Example:
python crovia_validate.py data/royalty.ndjson \
--out-md docs/VALIDATE_report.md \
--out-bad data/royalty_bad_sample.ndjson
Outputs:
docs/VALIDATE_*.md
data/*_bad_sample.ndjson
4.2 AI Act Annex IV Helpers — compliance_ai_act.py
Generates lightweight Annex-IV-style documentation:
- provider & shard distribution
- provenance hints
- concentration & risk signals
- gaps file (
*_gaps.ndjson) - JSON compliance pack
Run:
python compliance_ai_act.py data/royalty.ndjson \
--out-summary docs/AI_ACT_summary.md \
--out-gaps data/AI_ACT_gaps.ndjson \
--out-pack data/AI_ACT_pack.json
4.3 CCL Validation — tools/ccl_validate.py
Validates CCL v1.1 JSON descriptors for:
- AI models
- datasets
- RAG indices
- APIs / tools
Run:
python tools/ccl_validate.py my_model.ccl.json
Full CCL spec:
docs/CROVIA_CCL_v1.1.md
4.4 CEP Evidence Protocol v1 — crovia_generate_cep.py
CROVIA CEP.v1 is a compact, verifiable evidence block for:
- Hugging Face model cards
- research papers
- audit packs
- trust bundle metadata
Generated via:
python tools/crovia_generate_cep.py \
--trust-bundle trust_bundle.json \
--period 2025-11 \
--receipts data/royalty.ndjson \
--payouts data/payouts.csv \
--hashchain proofs/hashchain_*.txt \
--engine-version demo-2025 \
--output-format yaml
The result includes:
- SHA-256 of receipts / payouts / bundle
- hash-chain root
- trust metrics (avg_top1_share, DP epsilon range, CI indicators)
- generation metadata
Full spec:
docs/CROVIA_CEP_v1.md
5. Status & scope
This repository is:
- Open-core — attribution → trust → payouts → floors → proofs
- Demo-grade — synthetic data
- Evidence-first — built for transparency, auditability, reproducibility
Business logic, contracts, billing, CCT-attested tokens and settlement overrides live in the private PRO engine, not here.
6. Licensing
Apache License 2.0
Permitted:
- commercial or academic usage
- modification and redistribution
- closed or open derivatives
- integration into external pipelines
See the LICENSE file.
7. Copyright
© 2025 — Tarik En Nakhai Crovia Core Engine
This repository includes a NOTICE file (Apache-2.0 requirement).
8. Contact
394d7a4 (Spider: add real GSM8K evidence run (spider_receipt.v1))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crovia-0.1.0.tar.gz.
File metadata
- Download URL: crovia-0.1.0.tar.gz
- Upload date:
- Size: 64.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee3749c47b05bd78aa46db4837d312460245b5082f3b62a9c2ec8637490ca396
|
|
| MD5 |
fc97208e7494189ee49bae175fe19353
|
|
| BLAKE2b-256 |
3a21d336769e7b091892c5a6864c9083a53a999e27897d4254fd43e3ea43ce34
|
File details
Details for the file crovia-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crovia-0.1.0-py3-none-any.whl
- Upload date:
- Size: 75.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95562293b63515fb4e40f1cfbd7146ba573dc319c0072f5d117100ff58991b58
|
|
| MD5 |
683d0a64d34fb32fa817fc51f806fb2d
|
|
| BLAKE2b-256 |
a8e19fc6fe0aab04ff9713ba23dcf2d6cc4323929a45ae1f7487160aa0b54b22
|