Skip to main content

Annotate data artifacts with provenance and descriptions

Project description

data-annotations

data-annotations is a Python package for attaching provenance and structured descriptions to the files and directories your workflows produce.

It writes plain JSON annotation sidecars that are easy to inspect, archive, and publish with research outputs:

  • files use artifact.ext.annotation.json
  • directories use data-annotations.json at their root

Optional Markdown README sidecars can be generated for human-readable summaries.

Documentation

The full documentation is organized as a Diátaxis site: https://ceda-unibas.gitlab.io/tools/data-annotations/

Installation

Install the core library from PyPI:

pip install data-annotations

Or add it to a project with uv:

uv add data-annotations

Install CLI support when you want the data-annotations command:

pip install "data-annotations[cli]"
uv add "data-annotations[cli]"

Quick start

Decorate a function that writes an artifact. When the function runs, data-annotations records provenance and writes the JSON sidecar.

from pathlib import Path

from data_annotations.annotations import record_file_annotation
from data_annotations.description import FieldDefinition


@record_file_annotation(
    title="Participant Cohort",
    summary="Participant-level cohort assignments.",
    fields=[
        FieldDefinition(
            name="participant_id",
            data_type="string",
            summary="Stable participant identifier.",
            required=True,
            nullable=False,
        ),
    ],
    primary_key=["participant_id"],
    artifact_kind="dataset",
    write_readme=True,
)
def write_participants(artifact_path: Path, input_path: Path) -> Path:
    participant_ids = [
        line.strip()
        for line in input_path.read_text(encoding="utf-8").splitlines()[1:]
        if line.strip()
    ]
    artifact_path.parent.mkdir(parents=True, exist_ok=True)
    artifact_path.write_text(
        "participant_id\n" + "\n".join(participant_ids) + "\n",
        encoding="utf-8",
    )
    return artifact_path


artifact_path = Path("outputs") / "participants.csv"
write_participants(
    artifact_path=artifact_path,
    input_path=Path("data/raw/participants.csv"),
)

This writes:

outputs/participants.csv
outputs/participants.csv.annotation.json
outputs/participants.csv.README.md

CLI

The CLI supports retrospective annotation, provenance inspection, source recovery, and sanitized publish bundles.

data-annotations annotate file path/to/participants.csv --write-readme
data-annotations annotate directory path/to/run-001 --recursive
data-annotations provenance match path/to/participants.csv
data-annotations provenance chain path/to/participants.csv
data-annotations provenance checkout path/to/participants.csv
data-annotations publish path/to/run-001 path/to/publish-bundle

Development

From a source checkout (assuming you have Task installed):

task install
task lint
task type-check
task test

Build or preview the documentation site:

task docs-build
task docs-serve

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_annotations-2.8.1.tar.gz (62.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_annotations-2.8.1-py3-none-any.whl (80.8 kB view details)

Uploaded Python 3

File details

Details for the file data_annotations-2.8.1.tar.gz.

File metadata

  • Download URL: data_annotations-2.8.1.tar.gz
  • Upload date:
  • Size: 62.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for data_annotations-2.8.1.tar.gz
Algorithm Hash digest
SHA256 3569c5e7a45cb4ed803378367106eacd965b4885290ba9f6c4ce7d7f2a10ed8e
MD5 3a8a50ec23e4c8b184e468a6f86dd203
BLAKE2b-256 9a6bade0cfd86c7f576291b09beebf0ec3921f5a84fd5819934b3201763c4f58

See more details on using hashes here.

File details

Details for the file data_annotations-2.8.1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_annotations-2.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7a2cffdb5548ac2911362264a000064a556390a67f53e747b3341c8a2b529da
MD5 48c7a8ab741895d2a4f305ba28bf4110
BLAKE2b-256 0799e1f64a4bce873a1c47153f13c58d6b2b41757b2b4d0806c79044e8ae74ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page