Skip to main content

alkymi - Pythonic task automation

Project description

alkymi ⚗️

Pythonic task automation

build pypi versions

alkymi uses Python's basic building blocks to describe a directed-acyclic-graph (DAG) of computation, and adds a layer of caching to only evaluate functions when inputs have changed.

The key idea behind alkymi is to have your data or validation pipeline defined in the same language as the actual pipeline steps, allowing you to use standard Python tools (unit testing, linting, type checkers) to check the correctness of your full pipeline. No more make dataset!

All alkymi tasks (recipes) are created using references to other alkymi recipes. There's no magic tying together inputs and outputs based on file names, regexes, etc. - only function calls where alkymi provides the input arguments based on outputs further up the DAG.

NOTE: alkymi is still in the experimental alpha stage, and probably shouldn't be used for anything critical. You should count on most APIs changing with future development

Sample Code

Downloading and parsing the MNIST handwritten character dataset w/ caching (see examples/mnist for full code)

import alkymi as alk

@alk.recipe()
def urls() -> List[str]:
    # Return URLs of various parts of the dataset - alkymi will cache these as a list of strings
    train_images_url = "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
    train_labels_url = "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
    test_images_url = "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
    test_labels_url = "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"
    return [train_images_url, train_labels_url, test_images_url, test_labels_url]


@alk.foreach(urls)
def download_gzips(url: str) -> bytes:
    # Download each gzip file as raw bytes - alkymi will cache these to binary files
    # This will run once per URL, and only if the URL has changed since the last evaluation
    return urllib.request.urlopen(url).read()


@alk.foreach(download_gzips)
def parse_gzip_to_arrays(data: bytes) -> np.ndarray:
    # Unzip binary data and parse into numpy arrays - alkymi will cache the numpy arrays
    # This will run once per blob of input data, and only if the binary data has changed since the last evaluation
    with io.BytesIO(data) as f:
        with gzip.open(f) as gzip_file:
            return parse_idx(gzip_file)  # parse_idx definition left out for brevity (see examples/mnist)


# Evaluate 'parse_gzip_to_arrays' and all dependencies
# On subsequent evaluations, the final numpy arrays will be read from the cache and returned immediately - unless one of
# the recipes is marked dirty (if inputs have changed, or the recipe function itself has changed) - in that case, alkymi
# will do the minimum amount of work to bring the pipeline up-to-date, and then return the final numpy arrays 
train_images, train_labels, test_images, test_labels = parse_gzip_to_arrays.brew()

Or, if you need to wrap existing functions, you can simply do:

import alkymi as alk

download_archives = alk.foreach(urls)(download_gzips)
parse_arrays = alk.foreach(download_archives)(parse_gzip_to_arrays)
train_images, train_labels, test_images, test_labels = parse_arrays.brew()

Command Line Usage

In some scenarios, you may need to automate multiple tasks, and writing a Python script script for each might be a bit tedious - a common example of this is a Makefile that has rules for "style" (style checking), "install" (fetch dependencies), etc. In this case, you can use alkymi's Lab functionality:

from pathlib import Path
import alkymi as alk
import pytest

# 'glob_files()' is a built-in recipe generator that globs and returns a list of files
glob_test_files = alk.recipes.glob_files(Path("tests"), "test_*.py", recursive=True)

@alk.recipe(ingredients=[glob_test_files])
def test(test_files: List[Path]) -> None:
    # Convert Path objects to str
    result = pytest.main(args=[str(file) for file in test_files])
    if result != pytest.ExitCode.OK:
        raise Exception("Unit tests failed: {}".format(result))

lab = alk.Lab("my_lab")
lab.add_recipes(test)
lab.open()

The above code will cause the script to present the user with a command-line interface (CLI) with the following options:

  • status: Prints detailed status of all recipes contained in the lab (cached, needs reevaluation etc.)
  • brew: Runs one or more recipes with the provided names (in the above, running python labfile.py brew test would run the unit tests)

alkymi uses a labfile (labfile.py in the root of the repo) to automate tasks such as linting using flake8, static type checking using mypy, running unit tests using pytest, as well as creating and uploading distributions to PyPI. Note that labfile.py is also subject to static type checking and linting, just like every other Python file.

Documentation

Upcoming: readthedocs.org page

Upcoming Features

The following features are being considered for future implementation:

  • Arguments to recipes when calling brew in Lab CLI
  • Type annotations propagated from bound functions to recipes
  • Support for call/type checking all recipes (e.g. by adding a check command to Lab)
  • Code coverage for tests

Known Issues

  • alkymi currently doesn't check custom objects for altered external files when computing cleanliness (e.g. MyClass has a self._some_path that points to a file somewhere outside alkymi's internal cache)
  • alk.foreach() currently only supports enumerable inputs of type List or Dict
  • Recipes marked transient will always be dirty, and thus always require reevaluation. This functionality should be replaced by a proper means of creating recipes that don't cache outputs, but only run when needed to provide inputs for downstream recipes

Installation

Install via pip:

pip install --user alkymi

Or clone and install directly from source

git clone https://github.com/MathiasStokholm/alkymi.git
cd alkymi
pip install --user .

Or install using pip and github

pip install --user git+https://github.com/MathiasStokholm/alkymi.git

Testing

After installing, you can run the test suite (use the lint and type_check recipes to perform those actions):

python3 labfile.py brew test

License

alkymi is licensed under The MIT License as found in the LICENSE.md file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alkymi-0.0.4.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

alkymi-0.0.4-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file alkymi-0.0.4.tar.gz.

File metadata

  • Download URL: alkymi-0.0.4.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.2

File hashes

Hashes for alkymi-0.0.4.tar.gz
Algorithm Hash digest
SHA256 d6f51e4e9f88085e4daa1f87538c69cd31ea1b160c1f12243e099618833de5d8
MD5 84dfdb70eb80bb08bcdedd34ecbe98b5
BLAKE2b-256 08b8d314cc52b3eeee17a168c585cc24e78fc9c31f297e0118cb73ff96f9bf17

See more details on using hashes here.

File details

Details for the file alkymi-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: alkymi-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.2

File hashes

Hashes for alkymi-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8086e9370e6071020ef821cba8a65d1cfc71f36208cffdcbfb0a5e869209c08a
MD5 622a6801c9bdcbea0693058787730cfc
BLAKE2b-256 29bf5c8e106827982ec162347e62956931b56514d97d4abaf01954a247388d75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page