Skip to main content

SDK and CLI for capturing data-science lineage and persisting DAG snapshots to Walacor.

Project description

Walacor Data Tracking

License Apache 2.0 Walacor (1100127456347832400) Walacor (1100127456347832400) Walacor (1100127456347832400)


A schema-first framework to track, version, and store the full lineage of data transformations — from raw ingestion to final model output — using Walacor as a backend snapshot store.


✨ Why this exists

  • Reproducibility – Every transformation, parameter, and artifact is captured in a graph you can replay.
  • Auditability – Snapshots are immutable, version-controlled, and timestamped.
  • Collaboration – Team members see the same lineage and can compare or branch workflows.
  • Extensibility – Strict JSON-schemas keep today’s pipelines clean while allowing tomorrow’s to evolve safely.

🏗️ Core Concepts

Concept Stored as Purpose
Transform Node transform_node One operation (e.g., “fit model”, “clean text”).
Transform Edge transform_edge Dependency between two nodes.
Project Metadata project_metadata Run-level info (owner, description, timestamps).

Immutable Snapshots
Once a DAG is written to Walacor, it cannot mutate—only a new snapshot (with a higher SV or run ID) can supersede it.


🚀 Getting Started

1. Install the SDKs

pip install walatrack 

Make sure you're using Python 3.10+ and have internet access to reach the Walacor API.

2. Initialize the Tracking Components

To begin capturing your data lineage:

  • Start the Tracker – This manages the session and records operations.

  • Attach an Adapter – For example, use PandasAdapter to automatically track DataFrame transformations.

  • Add Writers – Choose where to send the output:

    • Console output for quick inspection
    • WalacorWriter to persist snapshots to the Walacor backend

Once set up, your transformation history will be automatically recorded and can be exported or persisted.


🧪 Example Use Cases

  • Track changes in a machine learning pipeline
  • Visualize column-level transformations in pandas
  • Record versions of a dataset as it’s cleaned and merged
  • Keep an auditable log of automated workflows

Here’s the updated README.md with a concise, illustrative example that highlights how easy it is to use walatrack. This is placed right after the Getting Started section and demonstrates a realistic tracking flow with minimal code:


🧪 Minimal Example

Here's how simple it is to start tracking transformations:

import pandas as pd
from walatrack import Tracker, PandasAdapter
from walatrack.writers import ConsoleWriter
from walatrack.writers.walacor import WalacorWriter

# 1. Start the tracker and adapter
tracker = Tracker().start()
adapter = PandasAdapter().start(tracker)

# 2. Define writers (console, or send to Walacor backend)
console_writer = ConsoleWriter()
walacor_writer = WalacorWriter(
    base_url="http://your-walacor-url/api",
    username="your-username",
    password="your-password",
    project_name="MyProject",
    description="Optiona Description"
)

# 3. Apply transformations as usual
df = pd.DataFrame({"id": [1, 2], "value": [100, 200]})
df2 = df.assign(new_val=df.value * 2)
df3 = df2.rename(columns={"value": "v"})

# 4. Stop and export the lineage
tracker.stop()

💡 The PandasAdapter automatically tracks operations like .assign(), .rename(), .merge(), etc., so you can work with pandas as usual — but with versioned lineage behind the scenes.


This snippet:

  • Is short enough to understand at a glance
  • Avoids hardcoded credentials or IPs
  • Clearly reflects your existing setup
  • Shows the power and simplicity of the library

🤝 Contributing

  1. Fork → feature branch → PR.
  2. Run pre-commit run --all-files.
  3. Add/Update unit tests and schema definitions.
  4. Keep the README & docs in sync.

📄 License

Apache 2.0 © 2025 Walacor & Contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

walacor_data_tracker-0.0.2.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

walacor_data_tracker-0.0.2-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file walacor_data_tracker-0.0.2.tar.gz.

File metadata

  • Download URL: walacor_data_tracker-0.0.2.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for walacor_data_tracker-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0bae36d3a780a369626ba787694953638fbc5564a71d3f45526b59741b4bdd62
MD5 8bc0115395cb6d7ca53924f26259d1e1
BLAKE2b-256 9f43d069bc96ac8a72b8406196ce1adac6c05da109e0e462dfff39906c4c9613

See more details on using hashes here.

Provenance

The following attestation bundles were made for walacor_data_tracker-0.0.2.tar.gz:

Publisher: release.yaml on walacor/walacor-data-tracker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file walacor_data_tracker-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for walacor_data_tracker-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa26a318b6f600b26db7eeabb70dd3d2fb9ca110cdb82fca68b148d5c2675ae
MD5 d99836a72d73f88366c26cfbfb0cc63f
BLAKE2b-256 2ff4c623e8955a03df46464591adb550d18f7e59a5ddbeeae0fdcbb73c71e8c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for walacor_data_tracker-0.0.2-py3-none-any.whl:

Publisher: release.yaml on walacor/walacor-data-tracker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page