SDK and CLI for capturing data-science lineage and persisting DAG snapshots to Walacor.
Project description
Walacor Data Tracking
A schema-first framework to track, version, and store the full lineage of data transformations — from raw ingestion to final model output — using Walacor as a backend snapshot store.
✨ Why this exists
- Reproducibility – Every transformation, parameter, and artifact is captured in a graph you can replay.
- Auditability – Snapshots are immutable, version-controlled, and timestamped.
- Collaboration – Team members see the same lineage and can compare or branch workflows.
- Extensibility – Strict JSON-schemas keep today’s pipelines clean while allowing tomorrow’s to evolve safely.
🏗️ Core Concepts
| Concept | Stored as | Purpose |
|---|---|---|
| Transform Node | transform_node |
One operation (e.g., “fit model”, “clean text”). |
| Transform Edge | transform_edge |
Dependency between two nodes. |
| Project Metadata | project_metadata |
Run-level info (owner, description, timestamps). |
Immutable Snapshots
Once a DAG is written to Walacor, it cannot mutate—only a new snapshot (with a higher SV or run ID) can supersede it.
🚀 Getting Started
1. Install the SDKs
pip install walatrack
Make sure you're using Python 3.10+ and have internet access to reach the Walacor API.
2. Initialize the Tracking Components
To begin capturing your data lineage:
-
Start the Tracker – This manages the session and records operations.
-
Attach an Adapter – For example, use
PandasAdapterto automatically track DataFrame transformations. -
Add Writers – Choose where to send the output:
- Console output for quick inspection
- WalacorWriter to persist snapshots to the Walacor backend
Once set up, your transformation history will be automatically recorded and can be exported or persisted.
🧪 Example Use Cases
- Track changes in a machine learning pipeline
- Visualize column-level transformations in pandas
- Record versions of a dataset as it’s cleaned and merged
- Keep an auditable log of automated workflows
Here’s the updated README.md with a concise, illustrative example that highlights how easy it is to use walatrack. This is placed right after the Getting Started section and demonstrates a realistic tracking flow with minimal code:
🧪 Minimal Example
Here's how simple it is to start tracking transformations:
import pandas as pd
from walatrack import Tracker, PandasAdapter
from walatrack.writers import ConsoleWriter
from walatrack.writers.walacor import WalacorWriter
# 1. Start the tracker and adapter
tracker = Tracker().start()
adapter = PandasAdapter().start(tracker)
# 2. Define writers (console, or send to Walacor backend)
console_writer = ConsoleWriter()
walacor_writer = WalacorWriter(
base_url="http://your-walacor-url/api",
username="your-username",
password="your-password",
project_name="MyProject",
description="Optiona Description"
)
# 3. Apply transformations as usual
df = pd.DataFrame({"id": [1, 2], "value": [100, 200]})
df2 = df.assign(new_val=df.value * 2)
df3 = df2.rename(columns={"value": "v"})
# 4. Stop and export the lineage
tracker.stop()
💡 The
PandasAdapterautomatically tracks operations like.assign(),.rename(),.merge(), etc., so you can work with pandas as usual — but with versioned lineage behind the scenes.
This snippet:
- Is short enough to understand at a glance
- Avoids hardcoded credentials or IPs
- Clearly reflects your existing setup
- Shows the power and simplicity of the library
🤝 Contributing
- Fork → feature branch → PR.
- Run
pre-commit run --all-files. - Add/Update unit tests and schema definitions.
- Keep the README & docs in sync.
📄 License
Apache 2.0 © 2025 Walacor & Contributors.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file walacor_data_tracker-0.0.3.tar.gz.
File metadata
- Download URL: walacor_data_tracker-0.0.3.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d60fd2388c725dc88d57744b1f67bb098b8e8da81836514ece48e8aa32531f4
|
|
| MD5 |
fdc8a8e7dece2218fabbe7f9290e6ba6
|
|
| BLAKE2b-256 |
bf003caecfc855e73bc17685b27daf9c2156ede6e7615f6e62d17179574f853d
|
Provenance
The following attestation bundles were made for walacor_data_tracker-0.0.3.tar.gz:
Publisher:
release.yaml on walacor/walacor-data-tracker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
walacor_data_tracker-0.0.3.tar.gz -
Subject digest:
4d60fd2388c725dc88d57744b1f67bb098b8e8da81836514ece48e8aa32531f4 - Sigstore transparency entry: 254564278
- Sigstore integration time:
-
Permalink:
walacor/walacor-data-tracker@41841baa57da1c08557b51a375e13b504cae2072 -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/walacor
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@41841baa57da1c08557b51a375e13b504cae2072 -
Trigger Event:
push
-
Statement type:
File details
Details for the file walacor_data_tracker-0.0.3-py3-none-any.whl.
File metadata
- Download URL: walacor_data_tracker-0.0.3-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edba9c4961d49f5aa14c75a9d681387203c9d2e24b20b6e47b28d3056f25a44e
|
|
| MD5 |
fc12fb22a7bb6f49d75017840b310469
|
|
| BLAKE2b-256 |
6e24f80ac6d9ddd4e559ffc6649175ae93987c375b20f61a2b81bba0d51340c4
|
Provenance
The following attestation bundles were made for walacor_data_tracker-0.0.3-py3-none-any.whl:
Publisher:
release.yaml on walacor/walacor-data-tracker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
walacor_data_tracker-0.0.3-py3-none-any.whl -
Subject digest:
edba9c4961d49f5aa14c75a9d681387203c9d2e24b20b6e47b28d3056f25a44e - Sigstore transparency entry: 254564299
- Sigstore integration time:
-
Permalink:
walacor/walacor-data-tracker@41841baa57da1c08557b51a375e13b504cae2072 -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/walacor
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@41841baa57da1c08557b51a375e13b504cae2072 -
Trigger Event:
push
-
Statement type: