Skip to main content

SDK and CLI for capturing data-science lineage and persisting DAG snapshots to Walacor.

Project description

Walacor Data Tracking

License Apache 2.0 Walacor (1100127456347832400) Walacor (1100127456347832400) Walacor (1100127456347832400)


A schema-first framework to track, version, and store the full lineage of data transformations — from raw ingestion to final model output — using Walacor as a backend snapshot store.


✨ Why this exists

  • Reproducibility – Every transformation, parameter, and artifact is captured in a graph you can replay.
  • Auditability – Snapshots are immutable, version-controlled, and timestamped.
  • Collaboration – Team members see the same lineage and can compare or branch workflows.
  • Extensibility – Strict JSON-schemas keep today’s pipelines clean while allowing tomorrow’s to evolve safely.

🏗️ Core Concepts

Concept Stored as Purpose
Transform Node transform_node One operation (e.g., “fit model”, “clean text”).
Transform Edge transform_edge Dependency between two nodes.
Project Metadata project_metadata Run-level info (owner, description, timestamps).

Immutable Snapshots
Once a DAG is written to Walacor, it cannot mutate—only a new snapshot (with a higher SV or run ID) can supersede it.


🚀 Getting Started

1. Install the SDKs

pip install walatrack 

Make sure you're using Python 3.10+ and have internet access to reach the Walacor API.

2. Initialize the Tracking Components

To begin capturing your data lineage:

  • Start the Tracker – This manages the session and records operations.

  • Attach an Adapter – For example, use PandasAdapter to automatically track DataFrame transformations.

  • Add Writers – Choose where to send the output:

    • Console output for quick inspection
    • WalacorWriter to persist snapshots to the Walacor backend

Once set up, your transformation history will be automatically recorded and can be exported or persisted.


🧪 Example Use Cases

  • Track changes in a machine learning pipeline
  • Visualize column-level transformations in pandas
  • Record versions of a dataset as it’s cleaned and merged
  • Keep an auditable log of automated workflows

Here’s the updated README.md with a concise, illustrative example that highlights how easy it is to use walatrack. This is placed right after the Getting Started section and demonstrates a realistic tracking flow with minimal code:


🧪 Minimal Example

Here's how simple it is to start tracking transformations:

import pandas as pd
from walatrack import Tracker, PandasAdapter
from walatrack.writers import ConsoleWriter
from walatrack.writers.walacor import WalacorWriter

# 1. Start the tracker and adapter
tracker = Tracker().start()
adapter = PandasAdapter().start(tracker)

# 2. Define writers (console, or send to Walacor backend)
console_writer = ConsoleWriter()
walacor_writer = WalacorWriter(
    base_url="http://your-walacor-url/api",
    username="your-username",
    password="your-password",
    project_name="MyProject",
    description="Optiona Description"
)

# 3. Apply transformations as usual
df = pd.DataFrame({"id": [1, 2], "value": [100, 200]})
df2 = df.assign(new_val=df.value * 2)
df3 = df2.rename(columns={"value": "v"})

# 4. Stop and export the lineage
tracker.stop()

💡 The PandasAdapter automatically tracks operations like .assign(), .rename(), .merge(), etc., so you can work with pandas as usual — but with versioned lineage behind the scenes.


This snippet:

  • Is short enough to understand at a glance
  • Avoids hardcoded credentials or IPs
  • Clearly reflects your existing setup
  • Shows the power and simplicity of the library

🤝 Contributing

  1. Fork → feature branch → PR.
  2. Run pre-commit run --all-files.
  3. Add/Update unit tests and schema definitions.
  4. Keep the README & docs in sync.

📄 License

Apache 2.0 © 2025 Walacor & Contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

walacor_data_tracker-0.0.3.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

walacor_data_tracker-0.0.3-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file walacor_data_tracker-0.0.3.tar.gz.

File metadata

  • Download URL: walacor_data_tracker-0.0.3.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for walacor_data_tracker-0.0.3.tar.gz
Algorithm Hash digest
SHA256 4d60fd2388c725dc88d57744b1f67bb098b8e8da81836514ece48e8aa32531f4
MD5 fdc8a8e7dece2218fabbe7f9290e6ba6
BLAKE2b-256 bf003caecfc855e73bc17685b27daf9c2156ede6e7615f6e62d17179574f853d

See more details on using hashes here.

Provenance

The following attestation bundles were made for walacor_data_tracker-0.0.3.tar.gz:

Publisher: release.yaml on walacor/walacor-data-tracker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file walacor_data_tracker-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for walacor_data_tracker-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 edba9c4961d49f5aa14c75a9d681387203c9d2e24b20b6e47b28d3056f25a44e
MD5 fc12fb22a7bb6f49d75017840b310469
BLAKE2b-256 6e24f80ac6d9ddd4e559ffc6649175ae93987c375b20f61a2b81bba0d51340c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for walacor_data_tracker-0.0.3-py3-none-any.whl:

Publisher: release.yaml on walacor/walacor-data-tracker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page