Skip to main content

Create, Run and Benchmark DVC Pipelines in Python

Project description

coeralls codecov Maintainability PyTest PyPI version code-style Documentation Binder DOI ZnTrack zincware

Logo

ZnTrack: A Parameter Tracking Package for Python

ZnTrack zɪŋk træk is a lightweight and easy-to-use package for tracking parameters in your Python projects using DVC. With ZnTrack, you can define parameters in Python classes and monitor how they change over time. This information can then be used to compare the results of different runs, identify computational bottlenecks, and avoid the re-running of code components where parameters have not changed.

Key Features

  • Parameter, output and metric tracking: ZnTrack makes it easy to store and track the values of parameters in your Python code. It further allows you to store any outputs produced and gives an easy interface to define metrics.
  • Lightweight and database-free: Unlike other parameter tracking solutions, ZnTrack is lightweight and does not require any databases.

Getting Started

To get started with ZnTrack, you can install it via pip: pip install zntrack

Next, you can start using ZnTrack to track parameters, outputs and metrics in your Python code. Here's an example of how to use ZnTrack to track the value of a parameter in a Python class. Start in an empty directory and run git init and dvc init for preparation.

Then put the following into a python file called hello_world.py and call it with python hello_world.py.

import zntrack
from random import randrange


class HelloWorld(zntrack.Node):
    """Define a ZnTrack Node"""
    # parameter to be tracked
    max_number: int = zntrack.params()
    # parameter to store as output
    random_number: int = zntrack.outs()

    def run(self):
        """Command to be run by DVC"""
        self.random_number = randrange(self.max_number)

if __name__ == "__main__":
    # Write the computational graph
    with zntrack.Project() as project:
        hello_world = HelloWorld(max_number=512)
    project.run()

This will create a DVC stage HelloWorld. The workflow is defined in dvc.yaml and the parameters are stored in params.yaml.

This will run the workflow with dvc repro automatically. Once the graph is executed, the results, i.e. the random number can be accessed directly by the Node object.

hello_world.load()
print(hello_world.random_number)

Tip

You can easily load a Node directly from a repository.

import zntrack

node = zntrack.from_rev(
    "ParamsToMetrics",
    remote="https://github.com/PythonFZ/zntrack-examples",
    rev="8d0c992"
)

Try accessing the params parameter and metrics output. All Nodes from this and many other repositories can be loaded like this.

An overview of all the ZnTrack features as well as more detailed examples can be found in the ZnTrack Documentation.

Technical Details

ZnTrack as an Object-Relational Mapping for DVC

On a fundamental level the ZnTrack package provides an easy-to-use interface for DVC directly from Python. It handles all the computational overhead of reading config files, defining outputs in the dvc.yaml as well as in the script and much more.

For more information on DVC visit their homepage.

References

If you use ZnTrack in your research and find it helpful please cite us.

@misc{zillsZnTrackDataCode2024,
  title = {{{ZnTrack}} -- {{Data}} as {{Code}}},
  author = {Zills, Fabian and Sch{\"a}fer, Moritz and Tovey, Samuel and K{\"a}stner, Johannes and Holm, Christian},
  year = {2024},
  eprint={2401.10603},
  archivePrefix={arXiv},
}

Copyright

This project is distributed under the Apache License Version 2.0.

Similar Tools

The following (incomplete) list of other projects that either work together with ZnTrack or can achieve similar results with slightly different goals or programming languages.

  • DVC - Main dependency of ZnTrack for Data Version Control.
  • dvthis - Introduce DVC to R.
  • DAGsHub Client - Logging parameters from within .Python
  • MLFlow - A Machine Learning Lifecycle Platform.
  • Metaflow - A framework for real-life data science.
  • Hydra - A framework for elegantly configuring complex applications
  • Snakemake - Workflow management system to create reproducible and scalable data analyses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zntrack-0.8.0.tar.gz (41.3 kB view details)

Uploaded Source

Built Distribution

zntrack-0.8.0-py3-none-any.whl (51.6 kB view details)

Uploaded Python 3

File details

Details for the file zntrack-0.8.0.tar.gz.

File metadata

  • Download URL: zntrack-0.8.0.tar.gz
  • Upload date:
  • Size: 41.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.5 Darwin/24.1.0

File hashes

Hashes for zntrack-0.8.0.tar.gz
Algorithm Hash digest
SHA256 5ad0dd4ec517880cb49ac794e0ff990b7dc2ba8395f1b3d9326998ecb69ebe97
MD5 6922b7c57bfcc925276e675d0b53b64e
BLAKE2b-256 5b8f89c97fb2ba12c13d435c99be6eab3a46f009a14692d776cd0486a3af694a

See more details on using hashes here.

File details

Details for the file zntrack-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: zntrack-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 51.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.5 Darwin/24.1.0

File hashes

Hashes for zntrack-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b1d3e98b026ad3312bddf7ef8cef6ba56455c542be0163629daf1214500a205
MD5 52b55f337e610867b263509aa1aa7331
BLAKE2b-256 eeaeaf397da3e142ccee4c3a4e887cc41a935451f20eccbb588a606d62af9cc0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page