Skip to main content

Governance-as-code data cards for multi-party AI dataset sharing

Project description

numinal

Governance-as-code data cards for multi-party AI dataset sharing.

numinal generates structured, machine-readable metadata for AI datasets — covering provenance, bias analysis, access policies, and EU AI Act Article 10 compliance. Built on Croissant (MLCommons) with a governance extension layer.

Why

If you're publishing a shared AI dataset — especially under a UK Sovereign AI grant, an UKRI programme, or any cross-organisational data sharing initiative — you need to document:

  • What the dataset contains (structure, provenance, collection methodology)
  • What biases exist and how they've been addressed
  • Who can access it, for what purposes, under what constraints
  • How it maps to EU AI Act Article 10 requirements

No existing tool produces this documentation in a machine-readable format. numinal does.

Install

pip install numinal

Quick start

# Generate a data card from your dataset directory
numinal init ./my-dataset/ --tier 2

# Or bootstrap from an existing Croissant JSON-LD document
numinal init ./my-dataset/ --from-croissant ./croissant.json --tier 2
numinal init ./my-dataset/ --from-croissant https://example.org/dataset/croissant.json

# Validate against compliance tiers
numinal validate ./my-dataset/numinal.yaml

# Check EU AI Act Article 10 compliance
numinal compliance ./my-dataset/numinal.yaml --regulation eu-ai-act-art-10

# Render as markdown for documentation
numinal render ./my-dataset/numinal.yaml -o datacard.md

What it looks like

Validation

$ numinal validate ./numinal.yaml

Tier 1 (discovery):           ✓ PASS — 8/8 required fields
Tier 2 (regulatory):          ✗ FAIL — 9/14 required fields
  Missing:
  - rai:measurementAssumptions (Art. 10(2)(d))
  - rai:suitabilityAssessment (Art. 10(2)(e))
  - rai:complianceGaps (Art. 10(2)(h))
  - rai:dataQualityAssessment (Art. 10(3))
  - rai:geographicContext (Art. 10(4))
Tier 3 (governed sharing):    ✗ FAIL — 14/23 required fields

Completeness: 67% (35/52 total fields)

Article 10 compliance

$ numinal compliance ./numinal.yaml --regulation eu-ai-act-art-10

EU AI Act Article 10 — 13 sub-requirements checked:

  ✓ 10(2)(a) Design choices
  ✓ 10(2)(b) Collection processes and origin
  ✓ 10(2)(b)+ Original collection purpose disclosure
  ✓ 10(2)(c) Data preparation operations
  ✗ 10(2)(d) Measurement assumptions — rai.measurementAssumptions missing
  ✓ 10(2)(e) Suitability assessment
  ✓ 10(2)(f) Bias examination
  ✓ 10(2)(g) Bias mitigation measures
  ✗ 10(2)(h) Compliance gaps identified — rai.complianceGaps missing
  ✓ 10(3) Data quality criteria
  ✓ 10(4) Geographic and contextual characteristics
  — 10(5) Special category data safeguards (skipped: not applicable)
  ✓ 10(6) Dataset role distinction

Score: 10/11 requirements met

Compliance tiers

Tier Name Purpose Who needs it
T1 Discovery Dataset is findable, understandable, usable Any dataset publisher
T2 Regulatory Supports EU AI Act Article 10 compliance Publishers whose data may be used in high-risk AI
T3 Governed sharing Full multi-party access control with audit trail Cross-organisational dataset sharing

T3 ⊇ T2 ⊇ T1. Start at T1, add fields as you need them.

How it works

numinal init scans your dataset directory and auto-detects:

  • File types, sizes, SHA-256 checksums
  • Existing README, LICENSE, Croissant metadata, HuggingFace dataset cards

It generates a numinal.yaml — the human-authored source of truth — with TODO markers for fields you need to fill in manually. Run numinal validate to see what's missing at each tier.

numinal does not profile your data. Schema details (field names, types, null rates, cardinality) are publisher-supplied: either filled in by hand at T2+, imported from a profiling tool, or bootstrapped from existing Croissant metadata via numinal init --from-croissant <path-or-url>.

Schema

The numinal data card extends Croissant with two additional layers:

Layer Standard What it covers
Dataset structure Croissant 1.0 (MLCommons) Files, schemas, splits, ML semantics
Responsible AI Croissant-RAI 1.0 (MLCommons) Bias, fairness, collection methodology
Governance Croissant-GOV 0.1 (numinal) Access policies, DUAs, compliance, metering

Every numinal data card is simultaneously valid Croissant metadata. The governance fields live in their own namespace — tools that don't understand gov: fields simply ignore them.

Controlled vocabularies are sourced from:

  • Organisation types: UK Cabinet Office Public Bodies Handbook, Companies House
  • Data use purposes: GA4GH Data Use Ontology (DUO), extended with AI-specific terms
  • High-risk domains: EU AI Act Annex III
  • Security classifications: UK Government Security Classifications Policy (GSCP)
  • Policy expression: W3C ODRL 2.2 via DPV-ODRL profile

See the specification for full details.

Example

See examples/uk-health-ai-corpus.yaml for a complete T3 data card demonstrating all three schema layers.

Commands

Command Status Purpose
numinal init Generate a data card from a dataset directory
numinal validate Validate against compliance tiers
numinal compliance Check against EU AI Act Article 10
numinal render Render as markdown
numinal diff planned Compare two data card versions

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numinal-0.5.0.tar.gz (74.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

numinal-0.5.0-py3-none-any.whl (51.8 kB view details)

Uploaded Python 3

File details

Details for the file numinal-0.5.0.tar.gz.

File metadata

  • Download URL: numinal-0.5.0.tar.gz
  • Upload date:
  • Size: 74.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for numinal-0.5.0.tar.gz
Algorithm Hash digest
SHA256 d08d6dc8b17c25994383baa3d4656bbc99cf749d594359e4915ed12548f8bca3
MD5 c8c4b2288613c3ab793c1e6d5d3ebbc1
BLAKE2b-256 2aa7299a92795594f2f86f4a6c127a01a82b7dbb51b802b63e63d0cb25d2dcdf

See more details on using hashes here.

Provenance

The following attestation bundles were made for numinal-0.5.0.tar.gz:

Publisher: ci.yml on numinal-ai/numinal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file numinal-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: numinal-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 51.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for numinal-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad44d4aef4887e3d82276016994a100e8609b2373be2cfa6cc1673cee5cc70d9
MD5 0c99d1fba08367121e9a774b6cb1cd27
BLAKE2b-256 8dc16d007d45f2c029ba7219e864df445dec8ad025e2f8057f9bd8364da10079

See more details on using hashes here.

Provenance

The following attestation bundles were made for numinal-0.5.0-py3-none-any.whl:

Publisher: ci.yml on numinal-ai/numinal

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page