[data like you mean it] A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.

Project description

dalymi

[data like you mean it]

A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.

dalymi allows to build data processing pipelines as directed acyclic graphs (DAGs) and facilitates rapid, but controlled, model development. The goal is to prototype quickly, but scale to production with ease. To achieve this, dalymi uses "make"-style workflows, i.e. tasks with missing input trigger the execution of input-producing tasks before being executed themselves. At the same time, dalymi provides fine control to run and undo specific pipeline parts for quick test iterations. This ensures output reproducability and minimizes manual errors.

Several features facilitate dalymi's goal:

simple, non-opinionated API (most choices left to user)
no external dependencies for pipeline execution
one-line installation (ready for use)
no configuration
auto-generated command line interface for pipeline execution
quick start, but high flexibility to customize and extend:
- task output can be stored in any format Python can touch (local files being the default)
- customizable command line arguments
- templated output location (e.g. timestamped files)
- support for automated checks on data integrity during runtime
DAG visualization using graphviz
API design encourages good development practices (modular code, defined data schemas, self-documenting code, easy workflow viz, etc.)

Installation

dalymi requires Python >= 3.5.

pip install dalymi

For the latest development:

pip install git+https://github.com/joschnitzbauer/dalymi.git

Documentation

http://dalymi.readthedocs.io/

Simple example

simple.py:

from dalymi import Pipeline
from dalymi.resources import PandasCSV
import pandas as pd


# Define resources:
numbers_resource = PandasCSV(name='numbers', loc='numbers.csv', columns=['number'])
squares_resource = PandasCSV(name='squares', loc='squares.csv', columns=['number', 'square'])


# Define the pipeline
pl = Pipeline()


@pl.output(numbers_resource)
def create_numbers(**context):
    return pd.DataFrame({'number': range(11)})


@pl.output(squares_resource)
@pl.input(numbers_resource)
def square_numbers(numbers, **context):
    numbers['square'] = numbers['number']**2
    return numbers


if __name__ == '__main__':
    # Run the default command line interface
    pl.cli()

Command line:

python simple.py run     # executes the pipeline. skips tasks for which output already exists.

More examples can be found here.

Roadmap

More docstrings
Unit tests
Continuous integration
Parallel task processing
REST API during pipeline run
Web interface for pipeline run

Warranty

Although dalymi is successfully used in smaller applications, it is not battle-tested yet and lacks unit tests. If you decide to use it, be prepared to communicate issues or fix bugs (it's not a lot of code... :)).

Contributions

... are welcome!

Project details

Release history Release notifications | RSS feed

This version

0.1.5

Nov 21, 2019

0.1.3

Apr 3, 2018

0.1.2

Jan 30, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dalymi-0.1.5.tar.gz (6.3 kB view details)

Uploaded Nov 21, 2019 Source

File details

Details for the file dalymi-0.1.5.tar.gz.

File metadata

Download URL: dalymi-0.1.5.tar.gz
Upload date: Nov 21, 2019
Size: 6.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.5.6

File hashes

Hashes for dalymi-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`7ae37fdb54b2c27e2b40c4831cc5b0d1ac030d4711a3ed38facc86ab64f4293f`
MD5	`92ad644f77b1ac9ffb409ae71260b515`
BLAKE2b-256	`2df07869f5a28067a4f45fbfec79a0bac43a3dde35cdef49052e38a50ebbe6c0`

See more details on using hashes here.

dalymi 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta