Governance-as-code data cards for multi-party AI dataset sharing
Project description
numinal
Governance-as-code data cards for multi-party AI dataset sharing.
numinal generates structured, machine-readable metadata for AI datasets — covering provenance, bias analysis, access policies, and EU AI Act Article 10 compliance. Built on Croissant (MLCommons) with a governance extension layer.
Why
If you're publishing a shared AI dataset — especially under a UK Sovereign AI grant, an UKRI programme, or any cross-organisational data sharing initiative — you need to document:
- What the dataset contains (structure, provenance, collection methodology)
- What biases exist and how they've been addressed
- Who can access it, for what purposes, under what constraints
- How it maps to EU AI Act Article 10 requirements
No existing tool produces this documentation in a machine-readable format. numinal does.
Install
pip install numinal
Quick start
# Generate a data card from your dataset directory
numinal init ./my-dataset/ --tier 2
# Validate against compliance tiers
numinal validate ./my-dataset/numinal.yaml
# Check EU AI Act Article 10 compliance
numinal compliance ./my-dataset/numinal.yaml --regulation eu-ai-act-art-10
# Render as markdown for documentation
numinal render ./my-dataset/numinal.yaml -o datacard.md
What it looks like
Validation
$ numinal validate ./numinal.yaml
Tier 1 (discovery): ✓ PASS — 8/8 required fields
Tier 2 (regulatory): ✗ FAIL — 9/14 required fields
Missing:
- rai:measurementAssumptions (Art. 10(2)(d))
- rai:suitabilityAssessment (Art. 10(2)(e))
- rai:complianceGaps (Art. 10(2)(h))
- rai:dataQualityAssessment (Art. 10(3))
- rai:geographicContext (Art. 10(4))
Tier 3 (governed sharing): ✗ FAIL — 14/23 required fields
Completeness: 67% (35/52 total fields)
Article 10 compliance
$ numinal compliance ./numinal.yaml --regulation eu-ai-act-art-10
EU AI Act Article 10 — 13 sub-requirements checked:
✓ 10(2)(a) Design choices
✓ 10(2)(b) Collection processes and origin
✓ 10(2)(b)+ Original collection purpose disclosure
✓ 10(2)(c) Data preparation operations
✗ 10(2)(d) Measurement assumptions — rai.measurementAssumptions missing
✓ 10(2)(e) Suitability assessment
✓ 10(2)(f) Bias examination
✓ 10(2)(g) Bias mitigation measures
✗ 10(2)(h) Compliance gaps identified — rai.complianceGaps missing
✓ 10(3) Data quality criteria
✓ 10(4) Geographic and contextual characteristics
— 10(5) Special category data safeguards (skipped: not applicable)
✓ 10(6) Dataset role distinction
Score: 10/11 requirements met
Compliance tiers
| Tier | Name | Purpose | Who needs it |
|---|---|---|---|
| T1 | Discovery | Dataset is findable, understandable, usable | Any dataset publisher |
| T2 | Regulatory | Supports EU AI Act Article 10 compliance | Publishers whose data may be used in high-risk AI |
| T3 | Governed sharing | Full multi-party access control with audit trail | Cross-organisational dataset sharing |
T3 ⊇ T2 ⊇ T1. Start at T1, add fields as you need them.
How it works
numinal init scans your dataset directory and auto-detects:
- File types, sizes, SHA-256 checksums
- Column names, data types, null rates, cardinality (CSV/TSV)
- Existing README, LICENSE, Croissant metadata
It generates a numinal.yaml — the human-authored source of truth — with TODO markers for fields you need to fill in manually. Run numinal validate to see what's missing at each tier.
Schema
The numinal data card extends Croissant with two additional layers:
| Layer | Standard | What it covers |
|---|---|---|
| Dataset structure | Croissant 1.0 (MLCommons) | Files, schemas, splits, ML semantics |
| Responsible AI | Croissant-RAI 1.0 (MLCommons) | Bias, fairness, collection methodology |
| Governance | Croissant-GOV 0.1 (numinal) | Access policies, DUAs, compliance, metering |
Every numinal data card is simultaneously valid Croissant metadata. The governance fields live in their own namespace — tools that don't understand gov: fields simply ignore them.
Controlled vocabularies are sourced from:
- Organisation types: UK Cabinet Office Public Bodies Handbook, Companies House
- Data use purposes: GA4GH Data Use Ontology (DUO), extended with AI-specific terms
- High-risk domains: EU AI Act Annex III
- Security classifications: UK Government Security Classifications Policy (GSCP)
- Policy expression: W3C ODRL 2.2 via DPV-ODRL profile
See the specification for full details.
Example
See examples/uk-health-ai-corpus.yaml for a complete T3 data card demonstrating all three schema layers.
Commands
| Command | Status | Purpose |
|---|---|---|
numinal init |
✓ | Generate a data card from a dataset directory |
numinal validate |
✓ | Validate against compliance tiers |
numinal compliance |
✓ | Check against EU AI Act Article 10 |
numinal render |
✓ | Render as markdown |
numinal diff |
planned | Compare two data card versions |
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file numinal-0.3.0.tar.gz.
File metadata
- Download URL: numinal-0.3.0.tar.gz
- Upload date:
- Size: 69.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cc75c1a0164661c1b5e9395f523bb203fd3e5efadb3dbb6d56f47275048c012
|
|
| MD5 |
d16347a47f936c7c5b153bc2d70d931a
|
|
| BLAKE2b-256 |
2cc646681d6342bd816fcba01d7ae5abf59c9146e6ff1a0cad4c03fd80016ee8
|
Provenance
The following attestation bundles were made for numinal-0.3.0.tar.gz:
Publisher:
ci.yml on numinal-ai/numinal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
numinal-0.3.0.tar.gz -
Subject digest:
2cc75c1a0164661c1b5e9395f523bb203fd3e5efadb3dbb6d56f47275048c012 - Sigstore transparency entry: 1437841644
- Sigstore integration time:
-
Permalink:
numinal-ai/numinal@cac89962cba8744f2c7d8a28f27c45f9f7543cbd -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/numinal-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cac89962cba8744f2c7d8a28f27c45f9f7543cbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file numinal-0.3.0-py3-none-any.whl.
File metadata
- Download URL: numinal-0.3.0-py3-none-any.whl
- Upload date:
- Size: 48.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59b6c8dcc49066bee823b612e33f11535e661737b4e61f589cd518725c8c46b9
|
|
| MD5 |
63e1c61b72982d50f7b9e0c71fea025e
|
|
| BLAKE2b-256 |
be7165a0606f52a838dea1269249da14b82f1f17fd57540703fa5b020d5b4efa
|
Provenance
The following attestation bundles were made for numinal-0.3.0-py3-none-any.whl:
Publisher:
ci.yml on numinal-ai/numinal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
numinal-0.3.0-py3-none-any.whl -
Subject digest:
59b6c8dcc49066bee823b612e33f11535e661737b4e61f589cd518725c8c46b9 - Sigstore transparency entry: 1437841656
- Sigstore integration time:
-
Permalink:
numinal-ai/numinal@cac89962cba8744f2c7d8a28f27c45f9f7543cbd -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/numinal-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@cac89962cba8744f2c7d8a28f27c45f9f7543cbd -
Trigger Event:
push
-
Statement type: