Skip to main content

Governance-as-code data cards for multi-party AI dataset sharing

Project description

numinal

Governance-as-code data cards for multi-party AI dataset sharing.

numinal generates structured, machine-readable metadata for AI datasets — covering provenance, bias analysis, access policies, and EU AI Act Article 10 compliance. Built on Croissant (MLCommons) with a governance extension layer.

Why

If you're publishing a shared AI dataset — especially under a UK Sovereign AI grant, an UKRI programme, or any cross-organisational data sharing initiative — you need to document:

  • What the dataset contains (structure, provenance, collection methodology)
  • What biases exist and how they've been addressed
  • Who can access it, for what purposes, under what constraints
  • How it maps to EU AI Act Article 10 requirements

No existing tool produces this documentation in a machine-readable format. numinal does.

Install

pip install numinal

Quick start

# Generate a data card from your dataset directory
numinal init ./my-dataset/ --tier 2

# Validate against compliance tiers
numinal validate ./my-dataset/numinal.yaml

# Check EU AI Act Article 10 compliance
numinal compliance ./my-dataset/numinal.yaml --regulation eu-ai-act-art-10

# Render as markdown for documentation
numinal render ./my-dataset/numinal.yaml -o datacard.md

What it looks like

Validation

$ numinal validate ./numinal.yaml

Tier 1 (discovery):           ✓ PASS — 8/8 required fields
Tier 2 (regulatory):          ✗ FAIL — 9/14 required fields
  Missing:
  - rai:measurementAssumptions (Art. 10(2)(d))
  - rai:suitabilityAssessment (Art. 10(2)(e))
  - rai:complianceGaps (Art. 10(2)(h))
  - rai:dataQualityAssessment (Art. 10(3))
  - rai:geographicContext (Art. 10(4))
Tier 3 (governed sharing):    ✗ FAIL — 14/23 required fields

Completeness: 67% (35/52 total fields)

Article 10 compliance

$ numinal compliance ./numinal.yaml --regulation eu-ai-act-art-10

EU AI Act Article 10 — 13 sub-requirements checked:

  ✓ 10(2)(a) Design choices
  ✓ 10(2)(b) Collection processes and origin
  ✓ 10(2)(b)+ Original collection purpose disclosure
  ✓ 10(2)(c) Data preparation operations
  ✗ 10(2)(d) Measurement assumptions — rai.measurementAssumptions missing
  ✓ 10(2)(e) Suitability assessment
  ✓ 10(2)(f) Bias examination
  ✓ 10(2)(g) Bias mitigation measures
  ✗ 10(2)(h) Compliance gaps identified — rai.complianceGaps missing
  ✓ 10(3) Data quality criteria
  ✓ 10(4) Geographic and contextual characteristics
  — 10(5) Special category data safeguards (skipped: not applicable)
  ✓ 10(6) Dataset role distinction

Score: 10/11 requirements met

Compliance tiers

Tier Name Purpose Who needs it
T1 Discovery Dataset is findable, understandable, usable Any dataset publisher
T2 Regulatory Supports EU AI Act Article 10 compliance Publishers whose data may be used in high-risk AI
T3 Governed sharing Full multi-party access control with audit trail Cross-organisational dataset sharing

T3 ⊇ T2 ⊇ T1. Start at T1, add fields as you need them.

How it works

numinal init scans your dataset directory and auto-detects:

  • File types, sizes, SHA-256 checksums
  • Column names, data types, null rates, cardinality (CSV/TSV)
  • Existing README, LICENSE, Croissant metadata

It generates a numinal.yaml — the human-authored source of truth — with TODO markers for fields you need to fill in manually. Run numinal validate to see what's missing at each tier.

Schema

The numinal data card extends Croissant with two additional layers:

Layer Standard What it covers
Dataset structure Croissant 1.0 (MLCommons) Files, schemas, splits, ML semantics
Responsible AI Croissant-RAI 1.0 (MLCommons) Bias, fairness, collection methodology
Governance Croissant-GOV 0.1 (numinal) Access policies, DUAs, compliance, metering

Every numinal data card is simultaneously valid Croissant metadata. The governance fields live in their own namespace — tools that don't understand gov: fields simply ignore them.

Controlled vocabularies are sourced from:

  • Organisation types: UK Cabinet Office Public Bodies Handbook, Companies House
  • Data use purposes: GA4GH Data Use Ontology (DUO), extended with AI-specific terms
  • High-risk domains: EU AI Act Annex III
  • Security classifications: UK Government Security Classifications Policy (GSCP)
  • Policy expression: W3C ODRL 2.2 via DPV-ODRL profile

See the specification for full details.

Example

See examples/uk-health-ai-corpus.yaml for a complete T3 data card demonstrating all three schema layers.

Commands

Command Status Purpose
numinal init Generate a data card from a dataset directory
numinal validate Validate against compliance tiers
numinal compliance Check against EU AI Act Article 10
numinal render Render as markdown
numinal diff planned Compare two data card versions

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numinal-0.1.0.tar.gz (68.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

numinal-0.1.0-py3-none-any.whl (48.3 kB view details)

Uploaded Python 3

File details

Details for the file numinal-0.1.0.tar.gz.

File metadata

  • Download URL: numinal-0.1.0.tar.gz
  • Upload date:
  • Size: 68.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for numinal-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b9eff333ba4f75146d4e2bbe70cbb6d3dfe85f7e60118ff552b98984dd698399
MD5 a28fba31a8613491149606b88bd3cda7
BLAKE2b-256 38aa0f35bcb7de94a07ac5cab9676198d96e37bf1aa9bed63b4ef9f66279bb96

See more details on using hashes here.

Provenance

The following attestation bundles were made for numinal-0.1.0.tar.gz:

Publisher: ci.yml on numinal-ai/numinal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numinal-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: numinal-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for numinal-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d76256d4402f76318ed477b749b3ea97737e5e9c1838a50869862dc3bca9eb28
MD5 9595551e1d00ff5b88dd3840fbcc8630
BLAKE2b-256 ee2e00a4af1a4d3225cacb26a3be2809665987f3cc32aaed1b99c14e66d7b690

See more details on using hashes here.

Provenance

The following attestation bundles were made for numinal-0.1.0-py3-none-any.whl:

Publisher: ci.yml on numinal-ai/numinal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page