Annotate data artifacts with provenance and descriptions
Project description
data-annotations
data-annotations is a Python package for attaching provenance and structured
descriptions to the files and directories your workflows produce.
It writes plain JSON annotation sidecars that are easy to inspect, archive, and publish with research outputs:
- files use
artifact.ext.annotation.json - directories use
data-annotations.jsonat their root
Optional Markdown README sidecars can be generated for human-readable summaries.
Documentation
The full documentation is organized as a Diátaxis site: https://ceda-unibas.gitlab.io/tools/data-annotations/
- Source: https://gitlab.com/ceda-unibas/tools/data-annotations
- Changelog: https://gitlab.com/ceda-unibas/tools/data-annotations/-/blob/main/CHANGELOG.md
- Work items: https://gitlab.com/ceda-unibas/tools/data-annotations/-/work_items
Installation
Install the core library from PyPI:
pip install data-annotations
Or add it to a project with uv:
uv add data-annotations
Install CLI support when you want the data-annotations command:
pip install "data-annotations[cli]"
uv add "data-annotations[cli]"
Quick start
Decorate a function that writes an artifact. When the function runs,
data-annotations records provenance and writes the JSON sidecar.
from pathlib import Path
from data_annotations.annotations import record_file_annotation
from data_annotations.description import FieldDefinition
@record_file_annotation(
title="Participant Cohort",
summary="Participant-level cohort assignments.",
fields=[
FieldDefinition(
name="participant_id",
data_type="string",
summary="Stable participant identifier.",
required=True,
nullable=False,
),
],
primary_key=["participant_id"],
artifact_kind="dataset",
write_readme=True,
)
def write_participants(artifact_path: Path, input_path: Path) -> Path:
participant_ids = [
line.strip()
for line in input_path.read_text(encoding="utf-8").splitlines()[1:]
if line.strip()
]
artifact_path.parent.mkdir(parents=True, exist_ok=True)
artifact_path.write_text(
"participant_id\n" + "\n".join(participant_ids) + "\n",
encoding="utf-8",
)
return artifact_path
artifact_path = Path("outputs") / "participants.csv"
write_participants(
artifact_path=artifact_path,
input_path=Path("data/raw/participants.csv"),
)
This writes:
outputs/participants.csv
outputs/participants.csv.annotation.json
outputs/participants.csv.README.md
CLI
The CLI supports retrospective annotation, provenance inspection, source recovery, and sanitized publish bundles.
data-annotations annotate file path/to/participants.csv --write-readme
data-annotations annotate directory path/to/run-001 --recursive
data-annotations provenance match path/to/participants.csv
data-annotations provenance chain path/to/participants.csv
data-annotations provenance checkout path/to/participants.csv
data-annotations publish path/to/run-001 path/to/publish-bundle
Development
From a source checkout (assuming you have Task installed):
task install
task lint
task type-check
task test
Build or preview the documentation site:
task docs-build
task docs-serve
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_annotations-2.8.1.tar.gz.
File metadata
- Download URL: data_annotations-2.8.1.tar.gz
- Upload date:
- Size: 62.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3569c5e7a45cb4ed803378367106eacd965b4885290ba9f6c4ce7d7f2a10ed8e
|
|
| MD5 |
3a8a50ec23e4c8b184e468a6f86dd203
|
|
| BLAKE2b-256 |
9a6bade0cfd86c7f576291b09beebf0ec3921f5a84fd5819934b3201763c4f58
|
File details
Details for the file data_annotations-2.8.1-py3-none-any.whl.
File metadata
- Download URL: data_annotations-2.8.1-py3-none-any.whl
- Upload date:
- Size: 80.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7a2cffdb5548ac2911362264a000064a556390a67f53e747b3341c8a2b529da
|
|
| MD5 |
48c7a8ab741895d2a4f305ba28bf4110
|
|
| BLAKE2b-256 |
0799e1f64a4bce873a1c47153f13c58d6b2b41757b2b4d0806c79044e8ae74ae
|