Kedro-Accelerator speeds up pipelines by parallelizing I/O in the background.

These details have not been verified by PyPI

Project links

Homepage

Project description

Kedro-Accelerator

Kedro pipelines consist of nodes, where an output from one node A can be an input to another node B. The Data Catalog defines where and how Kedro loads and saves these inputs and outputs, respectively. By default, a sequential Kedro pipeline:

runs node A
persists the output of A, often to remote storage like Amazon S3
potentially runs other nodes
fetches the output of A, loading it back into memory
runs node B

Persisting intermediate data sets enables partial pipeline runs (e.g. running node B without rerunning node A) and analysis/debugging of these data sets. However, the I/O in steps 2 and 4 above was not necessary to run node B, given the requisite data was already in memory after step 1. Kedro-Accelerator speeds up pipelines by parallelizing this I/O in the background.

How do I install Kedro-Accelerator?

Kedro-Accelerator is a Python plugin. To install it:

pip install kedro-accelerator

How do I use Kedro-Accelerator?

As of Kedro 0.16.4, TeePlugin—the core of Kedro-Accelerator—will be auto-discovered upon installation. In older versions, hook implementations should be registered with Kedro through the ProjectContext. Hooks were introduced in Kedro 0.16.0.

Prerequisites

The following conditions must be true for Kedro-Accelerator to speed up your pipeline:

Your project must use either SequentialRunner or ParallelRunner.

Example

The Kedro-Accelerator repository includes the Iris data set example pipeline generated using Kedro 0.16.1. Intermediate data sets have been replaced with custom SlowDataSet instances to simulate a slow filesystem. You can try different load and save delays by modifying catalog.yml.

To get started, create and activate a new virtual environment. Then, clone the repository and pip install requirements:

git clone https://github.com/deepyaman/kedro-accelerator.git
cd kedro-accelerator
KEDRO_VERSION=0.17.4 pip install -r src/requirements.txt  # Specify your desired Kedro version.

You can compare pipeline execution times with and without TeePlugin. Kedro-Accelerator also provides CachePlugin so that you can test performance using CachedDataSet in asynchronous mode. Assuming parametrized load and save delays of 10 seconds for intermediate datasets, you should see the following results:

Strategy	Command	Total time	Log
Baseline (i.e. no caching/plugins)	`kedro run`	2 minutes	Log
`TeePlugin`	`kedro run --hooks kedro_accelerator.plugins.TeePlugin`	10 seconds (saving all outputs)	Log
`CachePlugin` (i.e. `CachedDataSet`) with `is_async=True`	`kedro run --async --hooks kedro_accelerator.plugins.CachePlugin`	30 seconds (saving `split_data`, `train_model`, and `predict` node outputs)	Log

Prior to Kedro version 0.17.0, prefix extra hooks passed to kedro run with src. (e.g. src.kedro_accelerator.plugins.TeePlugin).

For a more complete discussion of the above benchmarks, see quantumblacklabs/kedro#420 (comment).

What license do you use?

Kedro-Accelerator is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Jun 28, 2021

0.2.0

Dec 27, 2020

0.1.0

Sep 13, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kedro-accelerator-0.3.0.tar.gz (5.6 kB view details)

Uploaded Jun 28, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kedro_accelerator-0.3.0-py3-none-any.whl (5.1 kB view details)

Uploaded Jun 28, 2021 Python 3

File details

Details for the file kedro-accelerator-0.3.0.tar.gz.

File metadata

Download URL: kedro-accelerator-0.3.0.tar.gz
Upload date: Jun 28, 2021
Size: 5.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for kedro-accelerator-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`5d50c5dc19a989e3f743182a73e8c5fab236071d2f664e0582ddf5d49616099e`
MD5	`267b4ca758ed390f1f9bde309ecdf778`
BLAKE2b-256	`9ac364f4109214021b3e26d78201577fc5fdaa657fde8b1e212cc18c479df045`

See more details on using hashes here.

File details

Details for the file kedro_accelerator-0.3.0-py3-none-any.whl.

File metadata

Download URL: kedro_accelerator-0.3.0-py3-none-any.whl
Upload date: Jun 28, 2021
Size: 5.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for kedro_accelerator-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`818761bceff67c85d691decd2d2b9b639e46871763b21a07ded4858f14c94628`
MD5	`f9f79d20fd4a36cee36f5333edb49ca0`
BLAKE2b-256	`34bc31746e9594a4af77c693fcdc08e90dbc0e01bffbe970c6ac33b17614721f`

See more details on using hashes here.

kedro-accelerator 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kedro-Accelerator

How do I install Kedro-Accelerator?

How do I use Kedro-Accelerator?

Prerequisites

Example

What license do you use?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes