Provenance tracking, intelligent caching, and data virtualization for scientific simulation workflows.
Project description
Consist is a caching and provenance layer for scientific simulation workflows. It records the code, configuration, input data, and output artifacts behind each run so expensive steps can be skipped safely and results remain queryable after the fact.
Consist is useful when a workflow has:
- long-running model steps that should cache-hit when inputs are unchanged;
- scenario variants that need explicit lineage and comparison;
- file-based tools that need stable local paths but still need canonical provenance;
- post-run questions like "which config produced this output?"
Installation
pip install consist
Optional integrations are installed as extras:
pip install "consist[ingest]"
pip install "consist[docker]"
[!NOTE] Consist is pre-
1.0. It is ready for real workflows, but minor releases may still include breaking changes while the API settles.
Quick Example
from pathlib import Path
import pandas as pd
import consist
from consist import ExecutionOptions, Tracker
tracker = Tracker(run_dir="./runs", db_path="./provenance.duckdb")
def clean_data(raw: Path, threshold: float = 0.5) -> dict[str, Path]:
df = pd.read_parquet(raw)
out = Path("./cleaned.parquet")
df[df["value"] > threshold].to_parquet(out)
return {"cleaned": out}
first = tracker.run(
fn=clean_data,
inputs={"raw": Path("raw.parquet")},
config={"threshold": 0.5},
outputs=["cleaned"],
execution_options=ExecutionOptions(input_binding="paths"),
)
second = tracker.run(
fn=clean_data,
inputs={"raw": Path("raw.parquet")},
config={"threshold": 0.5},
outputs=["cleaned"],
execution_options=ExecutionOptions(input_binding="paths"),
)
print(first.cache_hit, second.cache_hit) # False, True
cleaned = consist.load_df(second.outputs["cleaned"])
In this example, input_binding="paths" tells Consist to pass local Path objects
into the callable instead of loading input files. Those same paths are still hashed
and recorded for cache identity and lineage. For tools that need inputs copied to
specific local filenames, see Usage Guide.
Documentation
| Start here | Use it for |
|---|---|
| Quickstart | First tracked run and cache hit |
| First Workflow | Two-step pipeline with explicit artifact links |
| Usage Guide | Choosing between run, trace, and scenario |
| Caching & Hydration | Cache identity, hit behavior, and output recovery concepts |
| Historical Recovery | Restoring archived outputs and staging inputs |
| CLI Reference | Inspecting runs, artifacts, lineage, and schemas |
| API Reference | Public Python API and generated signatures |
Etymology
In railroad terminology, a consist is the lineup of locomotives and cars that make up a train. In this library, a consist is the immutable record of the code, config, inputs, and outputs coupled together to produce a result.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file consist-0.1.4.tar.gz.
File metadata
- Download URL: consist-0.1.4.tar.gz
- Upload date:
- Size: 403.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d688092f8373e5a665a165036cd848ec63832a417ebfc7c3c71b22da47f9b97b
|
|
| MD5 |
e178bc6e8d1f787b09ce67bb98d2b39c
|
|
| BLAKE2b-256 |
e56ee84a46222541d6c0fa90647f4a7aa566e15a902c7f955667f1bd7bcc803c
|
File details
Details for the file consist-0.1.4-py3-none-any.whl.
File metadata
- Download URL: consist-0.1.4-py3-none-any.whl
- Upload date:
- Size: 451.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eadd1757c504ef7c645accef52ebc646fcac83af3979fd378eae5d4383090367
|
|
| MD5 |
48a524dcfb8a0bc649d9233cb4fdbaae
|
|
| BLAKE2b-256 |
142f3ed494366a02a8a7b4cf296d1625d68003f2b1664442f19be07b719563bd
|