Skip to main content

Simple, intuitive pipelining in Python

Project description

Conveyor - simple, intuitive pipelining

Conveyor is a multiprocessing framework to facilitate writing data pipelines and job systems in Python. The Conveyor provides five main features a developer can make use of: processors, pipes, replicated forks, balanced forks and joins.

Processors

Processors are the Conveyor's logical compute cores. Each processor will take in some data as input, transform that input and optionally return an output. Processors can maintain an internal state. Each of these processors is user-defined and wrapped in its own Python instance, allowing parallel execution of multiple processors at the same time. A user can specify sets of processors that should execute serially as joined by pipes or sets of processors that should act in parallel as defined by forks.

Pipes

Pipes serve as links between processors. They act as a stream, transporting data from one point in the pipeline to another. Pipes are implemented as a producer-consumer buffer where a leading processor acts as the producer and a trailing processor acts as a consumer.

Replicated Forks

Replicated Forks allow one processor to split output data into multiple copies so that multiple processors can then perform operations using the entire output data. This will allow multiple different ML models to be trained and tested in parallel. The input-output numbering of the many-to-one relationship of forks is primarily user defined.

Balanced Forks

Balanced Forks allow one processor to balance a stream of data over multiple consumer processors. This will serve to minimize the effect of pipe stalling for larger data sets. The input-output numbering of the many-to-one relationship of forks is primarily determined by pipe stalling detected at runtime and the number of physical cores available.

Joins

Joins allow multiple processors to combine their data streams into one logical pipe. The combined stream can then be forked again for the next step of the pipeline, processed by a single processor, or serve as output at the end of the pipeline.

Rules for constructing pipelines

In order to allow developers to use the pipeline however they see fit, Conveyor attempts to be unopinionated, however there must be some rules to keep the pipelines from becoming nonsensical or a programming language of it's own. The pipeline should be kept simple and most of the program's logic should be done in the processor steps.

  • All elements in the pipeline must be Conveyor Types
  • A Pipeline must start with a process
  • If not starting the pipeline, every Processor must be preceded by a Pipe, Fork, or Join
    • If a Pipe is not explicitly created, it will be automatically inserted, however a warning will be raised
  • Every Fork and Join must be preceded by a pipe

Testing

To run the tests, ensure nose is installed and run nosetests from the project directory

pip3 install nose && nosetests

Building from source

To build the distribution archives, you will need the latest version of setuptools and wheel.

python3 -m pip install --user --upgrade setuptools wheel

Run setup.py to build using the following command:

python3 setup.py sdist bdist_wheel

The compiled .whl and .tar.gz files will be in the /dist directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel-conveyor-0.0.1.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

parallel_conveyor-0.0.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file parallel-conveyor-0.0.1.tar.gz.

File metadata

  • Download URL: parallel-conveyor-0.0.1.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for parallel-conveyor-0.0.1.tar.gz
Algorithm Hash digest
SHA256 47b5dbbd87f62c2e30db0d9029cffd21cc2d4bd10887ace24758b65330db6e92
MD5 37c388f257b84f84774adb943037a38a
BLAKE2b-256 345980435a18f8f871a845c16df9357597bec8a1822ffd9fbca26369159739b2

See more details on using hashes here.

File details

Details for the file parallel_conveyor-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: parallel_conveyor-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for parallel_conveyor-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf9d5099146c41336017d256aedc9d4ab8e2ccfac75ec872f719ea303e904bb0
MD5 80feb4b9a4d79a2c47d0e02f0788d434
BLAKE2b-256 0c80f032d2eee0866038c6df670e3d705c257762f23800bdc478e68230570fc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page