Skip to main content

Provenance tracking, intelligent caching, and data virtualization for scientific simulation workflows.

Project description

Consist

CI Python 3.11+ License BSD 3-Clause

Consist is a caching and provenance layer for scientific simulation workflows. It records the code, configuration, input data, and output artifacts behind each run so expensive steps can be skipped safely and results remain queryable after the fact.

Consist is useful when a workflow has:

  • long-running model steps that should cache-hit when inputs are unchanged;
  • scenario variants that need explicit lineage and comparison;
  • file-based tools that need stable local paths but still need canonical provenance;
  • post-run questions like "which config produced this output?"

Installation

pip install consist

Optional integrations are installed as extras:

pip install "consist[ingest]"
pip install "consist[docker]"

[!NOTE] Consist is pre-1.0. It is ready for real workflows, but minor releases may still include breaking changes while the API settles.

Quick Example

from pathlib import Path

import pandas as pd

import consist
from consist import ExecutionOptions, Tracker

tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb")


def clean_data(raw: Path, threshold: float = 0.5) -> dict[str, Path]:
    df = pd.read_parquet(raw)
    out = Path("./cleaned.parquet")
    df[df["value"] > threshold].to_parquet(out)
    return {"cleaned": out}


first = tracker.run(
    fn=clean_data,
    inputs={"raw": Path("raw.parquet")},
    config={"threshold": 0.5},
    outputs=["cleaned"],
    execution_options=ExecutionOptions(input_binding="paths"),
)

second = tracker.run(
    fn=clean_data,
    inputs={"raw": Path("raw.parquet")},
    config={"threshold": 0.5},
    outputs=["cleaned"],
    execution_options=ExecutionOptions(input_binding="paths"),
)

print(first.cache_hit, second.cache_hit)  # False, True
cleaned = consist.load_df(second.outputs["cleaned"])

In this example, input_binding="paths" tells Consist to pass local Path objects into the callable instead of loading input files. Those same paths are still hashed and recorded for cache identity and lineage. For tools that need inputs copied to specific local filenames, see Usage Guide.

Documentation

Start here Use it for
Quickstart First tracked run and cache hit
First Workflow Two-step pipeline with explicit artifact links
Usage Guide Choosing between run, trace, and scenario
Caching & Hydration Cache identity, hit behavior, and output recovery concepts
Historical Recovery Restoring archived outputs and staging inputs
CLI Reference Inspecting runs, artifacts, lineage, and schemas
API Reference Public Python API and generated signatures

Etymology

In railroad terminology, a consist is the lineup of locomotives and cars that make up a train. In this library, a consist is the immutable record of the code, config, inputs, and outputs coupled together to produce a result.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consist-0.1.4.tar.gz (403.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

consist-0.1.4-py3-none-any.whl (451.5 kB view details)

Uploaded Python 3

File details

Details for the file consist-0.1.4.tar.gz.

File metadata

  • Download URL: consist-0.1.4.tar.gz
  • Upload date:
  • Size: 403.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for consist-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d688092f8373e5a665a165036cd848ec63832a417ebfc7c3c71b22da47f9b97b
MD5 e178bc6e8d1f787b09ce67bb98d2b39c
BLAKE2b-256 e56ee84a46222541d6c0fa90647f4a7aa566e15a902c7f955667f1bd7bcc803c

See more details on using hashes here.

File details

Details for the file consist-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: consist-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 451.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for consist-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eadd1757c504ef7c645accef52ebc646fcac83af3979fd378eae5d4383090367
MD5 48a524dcfb8a0bc649d9233cb4fdbaae
BLAKE2b-256 142f3ed494366a02a8a7b4cf296d1625d68003f2b1664442f19be07b719563bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page