Annotate data artifacts with provenance and descriptions
Project description
data-annotations
A Python package for attaching provenance and structured descriptions to the files and directories your workflows produce.
It is designed for lightweight research and reproducibility pipelines where you want generated datasets, tables, plots, or reports to carry enough context to explain where they came from and what they contain.
The package captures common provenance automatically and writes plain JSON and Markdown artifacts that are easy to inspect or archive. The canonical on-disk format uses one JSON annotation document per artifact:
- Files use
artifact.ext.annotation.json - Directories carry
data-annotations.jsonat their root
Each annotation document stores four top-level sections:
annotation_versionsubjectprovenancedescription
Here's the mental model: files get a visible sibling annotation, and directories carry one visible annotation at their root. Treat the annotation as part of the research output bundle.
See the changelog for release history and upgrade-oriented notes.
Installation
Install the core library from PyPI with pip:
pip install data-annotations
Or add it to a project with uv:
uv add data-annotations
The command-line interface uses optional dependencies. Install the package with
CLI support when you want to run data-annotations commands:
pip install "data-annotations[cli]"
uv add "data-annotations[cli]"
For development or unreleased source installs, install directly from GitLab:
uv add "data-annotations @ git+https://gitlab.com/ceda-unibas/tools/data-annotations.git"
pip install "data-annotations @ git+https://gitlab.com/ceda-unibas/tools/data-annotations.git"
Pin a source install to a particular release tag x.y.z with:
uv add "data-annotations @ git+https://gitlab.com/ceda-unibas/tools/data-annotations.git@x.y.z"
What gets captured automatically
Every annotation document includes provenance with:
- A UTC creation timestamp
- Hostname and username
- The script path and command-line arguments
- The script path relative to the Git repo root when it can be determined
- Git commit, branch, dirty state, canonical repository remote, exact tags, and
git describeoutput when available - A source-code reference for recovery, derived from Git metadata when possible or supplied explicitly for archives, individual files, and DOI/URI records
- The current
SLURM_JOB_IDwhen available - Structured snapshots for recorded local inputs, including file checksums, directory content digests, and upstream annotation sidecar references when present
Local file hashing defaults to checksum policy auto: existing files are hashed
only up to 10 * 1024**3 bytes (10 GiB). Larger files are still recorded, but
their sha256 or directory content_digest is left unset unless you provide a
precomputed checksum yourself.
You can also attach your own parameters, input file paths, and function names.
Local filesystem paths in provenance are stored as absolute paths. URI-style inputs
such as s3://... or https://... are preserved as provided.
Git tags and git_describe are human-friendly hints only. For Git sources,
git_sha and source_code.revision identify the recoverable code state.
Quick Start
The recommended way to annotate your data artifacts is to decorate pipeline
functions that consume some inputs and parameters, then write those artifacts.
This keeps the artifact-writing logic explicit while letting data-annotations capture
provenance and emit sidecars automatically.
For example, here is a complete file-level annotation workflow using the
record_file_annotation(...) decorator. Once write_participants is called, it
automatically generates sidecars participants.csv.annotation.json and participants.csv.README.md.
The JSON sidecar will contain provenance and description metadata, and the Markdown sidecar
will have a human-friendly rendering of the description provided in the decorator.
from pathlib import Path
from data_annotations.annotations import record_file_annotation
from data_annotations.description import AllowedValue, FieldDefinition
@record_file_annotation(
title="Participant Cohort",
summary="Participant-level cohort assignments for the validation split.",
fields=[
FieldDefinition(
name="participant_id",
data_type="string",
summary="Stable participant identifier.",
required=True,
nullable=False,
),
FieldDefinition(
name="group",
data_type="string",
summary="Assigned study group.",
allowed_values=[
AllowedValue(value="control"),
AllowedValue(value="treatment"),
],
),
],
primary_key=["participant_id"],
artifact_kind="dataset",
acquisition_context={"source": "Study A registry export"},
generation_context={"pipeline": "baseline-v1"},
)
def write_participants(
artifact_path: Path,
input_path: Path,
split: str,
) -> Path:
participant_ids = [
line.strip()
for line in input_path.read_text(encoding="utf-8").splitlines()[1:]
if line.strip()
]
artifact_path.parent.mkdir(parents=True, exist_ok=True)
artifact_path.write_text(
"\n".join(
[
"participant_id,group,split",
*[
f"{participant_id},control,{split}"
for participant_id in participant_ids
],
]
)
+ "\n",
encoding="utf-8",
)
return artifact_path
# Annotation sidecars are written automatically
# when the decorated function is called:
artifact_path = Path("outputs") / "participants.csv"
write_participants(
artifact_path=artifact_path,
input_path=Path("data/raw/participants.csv"),
split="validation",
)
print(f"{artifact_path}.annotation.json")
print(f"{artifact_path}.README.md")
Decorator Contract
You write a normal Python function and the decorator returns that function's original return value unchanged.
For provenance-bearing decorators, recorded inputs are inferred from named
function arguments such as input_path and input_paths. Those arguments
should correspond to real data dependencies used inside the wrapped function.
For file decorators:
record_file_manifest(...)record_file_annotation(...)record_file_description(...)
Your function should:
- accept one argument pointing at the output file path. By default this argument
is named
artifact_path, but you can change the expected name withartifact_path_arg=.... - use any other normal Python arguments you need for the pipeline step.
- for provenance-bearing decorators, use argument names listed in
input_argsfor real upstream dependencies you want recorded as provenance inputs. By default those names are("input_path", "input_paths").
Your function may return any value. File decorators do not inspect that return
value. Returning the generated artifact_path is recommended because it is
convenient for callers, but it is not required.
For directory decorators:
record_directory_manifest(...)record_directory_annotation(...)record_directory_description(...)
Your function should:
- accept one argument pointing at the output directory. By default this argument
is named
output_dir, but you can change the expected name withoutput_dir_arg=.... - return a materialized iterable, usually a
list, describing the files that were produced in that directory. - prefer returning a
listortuplerather than a generator, since the decorator needs to iterate over the outputs to write sidecars.
Accepted directory return items are:
DocumentedArtifactwhen you want per-artifact title, summary, fields, keys, or missing-value metadata.DocumentedArtifactGroupforrecord_directory_annotation(...)andrecord_directory_description(...)when many files share one title, summary, kind, and optional schema metadata.ProducedFilewhen you only need path, kind, and optional precomputed hash.ChildBundlewhen an annotated child directory should be referenced as its own independently shareable bundle.(path, kind)tuples when path and artifact kind are enough.- plain path-like values when the artifact kind can default to
"other".
For provenance-bearing directory decorators, input_args works the same way as
for file decorators: matching argument names are recorded as inputs, and the
remaining bound arguments become provenance params.
Here is another decorator pattern example with record_directory_annotation(...):
from pathlib import Path
from data_annotations.annotations import record_directory_annotation
from data_annotations.description import (
DocumentedArtifact,
DocumentedArtifactGroup,
FieldDefinition,
)
from data_annotations.provenance import ProducedFile
@record_directory_annotation(
title="Validation Outputs",
summary="Directory-level documentation for the validation run outputs.",
acquisition_context={"source": "Study A registry export"},
generation_context={"pipeline": "baseline-v1"},
)
def build_outputs(
output_dir: Path,
input_path: Path,
split: str,
):
participant_ids = [
line.strip()
for line in input_path.read_text(encoding="utf-8").splitlines()[1:]
if line.strip()
]
output_dir.mkdir(parents=True, exist_ok=True)
table_path = output_dir / "scores.csv"
table_path.write_text(
"\n".join(
[
"participant_id,score,split",
*[
f"{participant_id},0.94,{split}"
for participant_id in participant_ids
],
]
)
+ "\n",
encoding="utf-8",
)
report_path = output_dir / "summary.txt"
report_path.write_text(
(
f"Validated {len(participant_ids)} participants from "
f"{input_path.name} for the {split} split.\n"
),
encoding="utf-8",
)
plot_paths = []
for day in ["2024-01-01", "2024-01-02", "2024-01-03"]:
plot_path = output_dir / f"sma_{day}.png"
plot_path.write_bytes(
(
f"plot placeholder for the SMA variable on {day}, "
f"derived from {input_path.name}\n"
).encode("utf-8")
)
plot_paths.append(plot_path)
return [
DocumentedArtifact(
path=str(table_path),
kind="dataset",
title="Metrics Table",
fields=[
FieldDefinition(
name="metric",
data_type="string",
summary="Metric name.",
),
FieldDefinition(
name="value",
data_type="float",
summary="Metric value.",
),
],
),
ProducedFile(path=str(report_path), kind="report"),
DocumentedArtifactGroup(
title="Daily SMA plots",
summary="Plots of the same variable on different days.",
kind="plot",
paths=[str(path) for path in plot_paths],
selector="sma_*.png",
),
]
output_dir = Path("outputs") / "run-001"
build_outputs(
output_dir=output_dir,
input_path=Path("data/raw/participants.csv"),
split="validation",
)
print(output_dir / "data-annotations.json")
print(output_dir / "README.md")
The decorator and direct APIs write the same canonical document shape. If you need
metadata to vary per call instead of staying fixed at decoration time, use
annotate_file(...), annotate_directory(...), write_file_annotation(...), or
write_directory_annotation(...) directly instead. See the example gallery in
examples/ for runnable examples of all approaches.
When To Use Decorators Vs Direct Functions
If a function is only a final serializer for already-prepared data, prefer the
direct annotation and writer APIs. They let you attach inputs=[...] explicitly.
Canonical Document Shape
File annotations store:
subject.pathsubject.kindsubject.sha256provenance.*provenance.input_artifacts[]description.titledescription.summarydescription.fieldsdescription.primary_keydescription.missing_value_codesdescription.acquisition_contextdescription.generation_contextdescription.description_updated_at
Directory annotations store:
subject.pathsubject.produced_files[]subject.child_bundles[]subject.content_digestprovenance.*provenance.input_artifacts[]description.titledescription.summarydescription.artifact_groups[]description.artifacts[]description.acquisition_contextdescription.generation_contextdescription.description_updated_at
Use description.artifact_groups[] when many files have the same meaning, and
use description.artifacts[] only for file-specific notes, overrides, or schema.
Groups are descriptive only. Integrity still lives in subject.produced_files[],
which tracks every concrete file by path, kind, and checksum.
The description section intentionally excludes provenance linkage fields.
Directory produced_files[].path values are stored relative to subject.path,
which keeps verification stable when a complete output directory is copied or
archived elsewhere. subject.content_digest is computed from sorted tracked file
paths, file checksums, and referenced child bundle digests.
Artifact Groups
Artifact groups are for homogeneous sets of files that researchers naturally
understand as one output family: for example, 100 PNG plots of the same variable,
one per acquisition day. A group stores the shared title, summary, kind, optional
schema fields, and the concrete member paths. It can also store an informational
selector, such as plots/*.png, to show how the group was chosen.
Rules of thumb:
- Use artifact groups when many files have the same meaning.
- Use individual artifacts for file-specific notes, exceptions, or overrides.
- It is OK for an individual artifact to also appear in a group.
- Do not rely on groups for integrity.
subject.produced_files[]remains the complete checksum inventory.
Nested Directory Policy
Annotate the smallest thing you would share as a unit. If a directory is one
research output, give that directory one data-annotations.json, even when its
tracked files live in nested subdirectories.
Use recursive directory annotations for one bundle with nested files:
data-annotations annotate directory path/to/run-001 --recursive
data-annotations annotate directory path/to/run-001 --max-depth 2
Use child bundle annotations when a subdirectory is independently meaningful,
shareable, or reusable. In that case, annotate the child directory first, then
annotate the parent. The parent records a compact child_bundles[] reference
with the child path, child annotation path, and child content digest; it does not
copy the child file inventory into the parent JSON.
Post-hoc directory discovery follows the same rule. --recursive discovers
nested files, but it stops at annotated child directories containing
data-annotations.json and records them as child bundles.
Provenance Decorators And Writers
The data_annotations.provenance namespace provides provenance-only entry points.
Prefer the decorators when you already have a small function that writes artifacts:
from pathlib import Path
from data_annotations.provenance import record_file_manifest
@record_file_manifest(artifact_kind="report")
def write_report(
artifact_path: Path,
input_path: Path,
threshold: float = 0.5,
):
artifact_path.parent.mkdir(parents=True, exist_ok=True)
artifact_path.write_text(
f"threshold applied: {threshold}\nsource={input_path.name}\n",
encoding="utf-8",
)
write_report(
artifact_path=Path("outputs/summary.txt"),
input_path=Path("data/raw/participants.csv"),
threshold=0.75,
)
Use record_directory_manifest(...) for directory outputs. Directory decorators
accept DocumentedArtifact, ProducedFile, (path, kind), and plain path-like
return values. Provenance-only APIs do not accept description groups; use
unified annotation or description APIs when groups should appear in the JSON or
README.
If you want the direct writer approach instead, use write_file_manifest(...) and
write_directory_manifest(...) (see examples/).
Checksum Policy
All provenance and annotation entry points that hash local files support the same policy controls:
checksum_policy="auto": hash existing local files only when they are at or belowmax_checksum_bytes. This is the default, andmax_checksum_bytesdefaults to10 * 1024**3bytes (10 GiB).checksum_policy="always": hash existing local files regardless of size.checksum_policy="never": never hash local files automatically. Checksums are recorded only when you supply them explicitly.
When a checksum is skipped, JSON sidecars keep the same schema and simply store
sha256: null. Directory content_digest is also left unset when any tracked
member file lacks a checksum.
You can change the policy from Python:
from data_annotations.annotations import annotate_file
from data_annotations.provenance import write_file_manifest
write_file_manifest(
"outputs/summary.txt",
checksum_policy="always",
)
annotate_file(
"outputs/summary.txt",
title="Run Summary",
summary="Post-hoc summary.",
artifact_sha256="precomputed-sha256",
checksum_policy="never",
)
You can also inject precomputed checksums directly:
- File APIs: pass
artifact_sha256=.... - File or directory APIs: pass
checksum_overrides={path: sha256}. For directory outputs, keys can be relative to the output directory or absolute paths. - Decorators such as
record_file_manifest(...),record_directory_manifest(...),record_file_annotation(...), andrecord_directory_annotation(...)accept the same checksum-policy arguments.
From the CLI, use --checksum-policy, --max-checksum-bytes, --sha256, and
repeatable --checksum PATH=SHA256:
data-annotations annotate file path/to/summary.txt \
--title "Run Summary" \
--summary "Post-hoc summary." \
--kind report \
--checksum-policy never \
--sha256 0123456789abcdef...
data-annotations annotate directory path/to/run-001 \
--title "Processing outputs" \
--summary "Directory-level outputs." \
--checksum-policy never \
--checksum processed.csv=0123456789abcdef...
data-annotations provenance chain path/to/run-001 \
--checksum-policy always
For a complete runnable workflow, see examples/checksum_policy.py.
Description Layer
The data_annotations.description sub-package provides the structured description
models used by annotation writers and the Markdown sidecar renderers.
Within those models, the primary human-written narrative field is named summary.
Key public description models:
AllowedValueFieldDefinitionDocumentedArtifactDocumentedArtifactGroupArtifactDescriptionArtifactGroupDescriptionFileDescriptionDirectoryDescription
Description decorators and helpers:
record_file_description(...)record_directory_description(...)write_file_description(...)write_directory_description(...)render_file_readme(...)render_directory_readme(...)
Alias helpers write_file_readme(...) and write_directory_readme(...) are supported.
Use the decorator forms when the description metadata is stable for a function, and use the direct helpers when you want to assemble descriptions per call.
Recovery Helpers
Use artifact_matches_manifest(...) to verify whether a detached artifact still
matches an annotation document. Use analyze_provenance_chain(...) when you also
want to verify recorded inputs and recursively follow upstream annotation
sidecars. Use recover_manifest_source(...) to recover the recorded source code
from Git metadata, a recorded source archive, or a recorded source file.
checkout_manifest_source(...) remains available as a compatibility alias.
from pathlib import Path
from data_annotations.provenance import (
analyze_provenance_chain,
artifact_matches_manifest,
recover_manifest_source,
)
annotation_path = Path("outputs/participants.csv.annotation.json")
artifact_path = Path("downloads/participants.csv")
if artifact_matches_manifest(artifact_path, annotation_path):
chain = analyze_provenance_chain(artifact_path)
print(chain.status)
recovered = recover_manifest_source(annotation_path)
print(recovered.checkout_path)
print(recovered.script_path)
Post-Hoc Annotation
The strongest workflow is to create provenance and description at the same time as the artifact itself. When annotations are written during generation, the package can capture runtime context directly and the resulting records are typically more complete, precise, and trustworthy.
For existing artifacts, the CLI provides a post-hoc annotation path so you can still attach provenance and description after the fact.
Post-hoc descriptions can still be very useful, but the quality of post-hoc
provenance depends on how exact the supplied answers are. In particular, fields
such as the generating script, command, function, source-code URI, Git commit,
repository path, Git tags, git describe output, inputs, and parameters are
only as reliable as the information entered during annotation.
CLI Workflow
This package provides a command-line interface (CLI) for retrospective annotation and provenance inspection.
For post-hoc annotation:
data-annotations annotate file path/to/participants.csv
data-annotations annotate directory path/to/run-001
data-annotations annotate directory path/to/run-001 --recursive
data-annotations annotate directory path/to/run-001 --max-depth 2
data-annotations annotate directory path/to/run-001 \
--recursive \
--group-selector "plots/*.png" \
--group-title "Daily SMA plots" \
--group-summary "Plots of the same variable on different days." \
--group-kind plot
These commands prompt for missing details, write *.annotation.json or data-annotations.json,
and optionally derive README sidecars. Post-hoc records are marked with
capture_mode="post_hoc".
For shell workflows, you can move the prompt answers into a YAML file and run the command non-interactively:
data-annotations annotate file path/to/participants.csv --answers participants.yaml
data-annotations annotate directory path/to/run-001 --answers run-001.yaml
data-annotations annotate answers check participants.yaml
When --answers is provided, --no-interactive is the default. Use
--interactive if you want the YAML file to provide defaults and still prompt
for missing required values. If the YAML file includes target, the positional
target may be omitted; when both are provided, they must resolve to the same
path. Environment variables such as $DATA_ROOT and ${DATA_ROOT} are expanded
inside string values, and validation fails if a referenced variable is not set.
The answers check helper requires target so it can infer whether the answers
describe a file or a directory.
File answers can use top-level prompt-style keys:
target: path/to/participants.csv
title: Participant Cohort
summary: Participant-level cohort assignments.
kind: dataset
sha256: 0123456789abcdef...
inputs:
- ${DATA_ROOT}/raw/participants.csv
params:
split: validation
provenance:
command: bash scripts/build_participants.sh
script: scripts/build_participants.sh
git_sha: deadbeef
source_code:
kind: archive
uri: https://doi.org/10.5281/zenodo.12345
download_uri: https://zenodo.org/records/12345/files/source.zip
path: scripts/build_participants.sh
sha256: 0000000000000000000000000000000000000000000000000000000000000000
fields:
- name: participant_id
summary: Stable participant identifier.
data_type: string
required: true
nullable: false
primary_key:
- participant_id
Directory answers use an explicit inventory. Paths in artifacts,
artifact_groups.paths, and child_bundles are relative to the annotated
directory unless absolute:
target: path/to/run-001
title: Processing outputs
summary: Files produced by the shell processing workflow.
provenance:
command: bash process_from_instrument.sh
script: process_from_instrument.sh
checksums:
processed.csv: 0123456789abcdef...
artifacts:
- path: processed.csv
kind: dataset
title: Processed instrument output
summary: Normalized output from the processing script.
artifact_groups:
- title: Diagnostic plots
kind: plot
selector: plots/*.png
paths:
- plots/qc-1.png
- plots/qc-2.png
child_bundles:
- path: model
annotation_path: model/data-annotations.json
Answers files may also use schema-style aliases such as subject.path,
subject.kind, description.title, description.summary,
description.artifacts, description.artifact_groups, provenance.inputs,
and provenance.params.
For source-code recovery, provenance.source_code.kind may be git, archive,
file, or uri. Git sources use uri plus revision; archive and file
sources use uri or download_uri plus an optional sha256; path points to
the generating script inside the recovered source. DOI or landing-page-only
references can be recorded with kind: uri, but they are not directly
recoverable unless a direct archive or file download_uri is also recorded.
When group selectors are provided, the CLI expands them to concrete member paths
at annotation time. Grouped files are tracked in subject.produced_files[] but
are skipped by the per-file prompt flow, so you do not have to answer the same
questions for every matching file.
For post-hoc provenance, use --source-kind, --source-uri,
--source-download-uri, --source-path, --source-revision, and
--source-sha256 when the generating code is recoverable from a Git remote,
source archive, source file, or reference URI. Use repeatable --git-tag and
optional --git-describe when you know the original Git state; these values are
stored as human-readable hints.
For provenance inspection and source recovery:
data-annotations provenance match path/to/artifact
data-annotations provenance chain path/to/artifact
data-annotations provenance chain path/to/artifact --full-paths
data-annotations provenance checkout path/to/artifact
Command checkout recovers the recorded source code. For Git sources, it clones
the recorded remote and checks out the recorded revision. For archive and file
sources, it downloads or copies the recorded object, verifies sha256 when
present, and resolves the generating script path when recorded. Reference-only
URI sources are preserved in the annotation but are not directly recoverable.
The command prompts before downloading source code and defaults to No; use
--force when running trusted provenance checkout non-interactively.
Command match auto-discovers *.annotation.json for files and data-annotations.json for
directories, prints a verification summary, and suggests the exact checkout
command to run next when Git recovery metadata is available.
Command chain uses the same sidecar discovery, then verifies the artifact,
recorded input snapshots, and any upstream annotation sidecars reachable from
those inputs. Its default output shows a compact relative-path tree and lists
stale, missing, or unverifiable nodes first; use --full-paths when you need
absolute paths.
For publication workflows, create a sanitized copy of an annotated artifact tree:
data-annotations publish path/to/run-001 path/to/publish-bundle
data-annotations publish path/to/run-001 path/to/publish-bundle \
--prefix /private/raw/study-a='$INPUT_ROOT'
data-annotations publish path/to/run-001 path/to/publish-metadata \
--annotations-only
data-annotations publish path/to/run-001 path/to/publish-bundle --dry-run
Command publish recursively discovers file annotations (*.annotation.json) and
directory annotations (data-annotations.json), writes a mirrored publish bundle,
and regenerates README sidecars from sanitized annotation JSON. Paths under the
source directory are rewritten to $ARTIFACT_ROOT/...; additional --prefix
mappings rewrite other private path roots. Hostname, username, and SLURM job ID
are redacted by default. Git remote URLs are preserved unless
--redact-git-remote is provided. Strict mode is enabled by default and fails if
any local absolute path remains after sanitization; use --no-strict only after
reviewing --dry-run output.
If data-annotations provenance --help does not list chain, your shell is
resolving an older installed command. From a source checkout, use
uv run data-annotations provenance chain ..., or reinstall the CLI from the
updated source before using the bare data-annotations command.
Both match and chain also accept --checksum-policy and
--max-checksum-bytes. Use --checksum-policy always when you want full
verification of large local files, and leave the default auto when you prefer
to avoid long checksum passes on very large artifacts.
Run With uvx
uvx --from "data-annotations[cli]" data-annotations provenance match path/to/participants.csv
Install And Use With uv tool
uv tool install "data-annotations[cli]"
data-annotations provenance match path/to/participants.csv
Run From Repository Root
From the repository root while developing locally, run task install first.
That task uses uv sync --extra cli, so the CLI commands are available in
the project environment. You can then run:
uv run data-annotations annotate file path/to/participants.csv
uv run data-annotations annotate directory path/to/run-001
uv run data-annotations provenance match path/to/participants.csv
uv run data-annotations provenance chain path/to/participants.csv
uv run data-annotations provenance checkout path/to/participants.csv
uv run data-annotations publish path/to/run-001 path/to/publish-bundle
API Overview
Annotation Models
FileArtifactSubjectDirectoryArtifactSubjectFileAnnotationDocumentDirectoryAnnotationDocumentFileAnnotationResultDirectoryAnnotationResult
Annotation Decorators
record_file_annotation(...)record_directory_annotation(...)
Annotation Functions
write_file_annotation(...)write_directory_annotation(...)annotate_file(...)annotate_directory(...)
Description Models
AllowedValueFieldDefinitionDocumentedArtifactDocumentedArtifactGroupArtifactDescriptionArtifactGroupDescriptionFileDescriptionDirectoryDescription
Description Functions
record_file_description(...)record_directory_description(...)write_file_description(...)write_directory_description(...)write_file_readme(...)write_directory_readme(...)render_file_readme(...)render_directory_readme(...)
Provenance Models
ProducedFileChildBundleInputArtifactSourceCodeKindSourceCodeReferenceBaseProvenanceFileManifestDirectoryManifestProvenanceChainNodeProvenanceChainReportRecoveredSource
Provenance Functions
record_file_manifest(...)record_directory_manifest(...)write_file_manifest(...)write_directory_manifest(...)directory_content_digest(...)analyze_provenance_chain(...)provenance_chain_is_fresh(...)artifact_matches_manifest(...)recover_manifest_source(...)checkout_manifest_source(...)
Publish Functions
discover_annotation_paths(...)sanitize_annotation_document(...)sanitize_annotation_path(...)publish_directory(...)
Examples
Runnable examples live in examples/ and mirror the README workflows.
Run them from the repository root with:
uv run python examples/record_file_annotation.py
uv run python examples/record_directory_annotation.py
uv run python examples/record_file_manifest.py
uv run python examples/record_directory_manifest.py
uv run python examples/record_file_description.py
uv run python examples/record_directory_description.py
uv run python examples/annotate_file.py
uv run python examples/annotate_directory.py
uv run python examples/checksum_policy.py
uv run python examples/annotate_file_answers_cli.py
uv run python examples/write_file_manifest.py
uv run python examples/write_directory_manifest.py
uv run python examples/write_file_description.py
uv run python examples/write_directory_description.py
uv run python examples/provenance_chain.py
uv run python examples/provenance_chain_cli.py
uv run python examples/recover_provenance.py
uv run python examples/recover_provenance_cli.py
uv run python examples/recover_archive_source.py
uv run python examples/publish_cli.py
Each example writes its outputs to a fresh temporary directory and prints the location so you can inspect the generated annotation documents and README sidecars.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_annotations-2.6.0.tar.gz.
File metadata
- Download URL: data_annotations-2.6.0.tar.gz
- Upload date:
- Size: 68.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ee079175bf4a6d3a8eb3d3c9cfda53f75131d2e59733b4696eb166adb9d7111
|
|
| MD5 |
e0265ca121466bff7c3bd7517d91871a
|
|
| BLAKE2b-256 |
01946d9741c12b15179dbea24f8c46979353c326befff8effb1939c23c7ea299
|
File details
Details for the file data_annotations-2.6.0-py3-none-any.whl.
File metadata
- Download URL: data_annotations-2.6.0-py3-none-any.whl
- Upload date:
- Size: 77.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
154ce8d86eff98f64e60223507fafb2ab68a7101d9b17509fabdeafb7a696fdb
|
|
| MD5 |
72fc086d62c241fdb02556fe502f90fd
|
|
| BLAKE2b-256 |
902c69db991ac543d288d8ee1041fb7b01460d885abbfe1afa2623df28cc5ed7
|