Skip to main content

Python package to create and manage fast parallelized data processing pipeline for real-time application

Project description

PyStream - Real Time Python Pipeline Manager

This package provides tools to build and boost up a python data pipeline for real time processing. This package is managed using Poetry.

For more detailed guidelines, visit this project documentation.

Concepts

In general, PyStream is a package, fully implemented in python, that helps you manage a data pipeline and optimize its operation performance. The main feature of PyStream is that it can build your data pipeline in asynchronous and independent multi-threaded stages model, and hopefully multi-process model in the future.

A PyStream pipeline is constructed by several stages, where each stage represents a single set of data processing operations that you define by your own. When the stages have been defined, the pipeline can be operated in two modes:

  • Serial mode: In this mode, each stage are executed in blocking fashion. The later stages will only be executed when the previous ones have been executed, and the next data can only be processed if the previous data have been processed by the final stage. There is only one data stream that can be processed at any time.

  • Parallel mode: In this mode, each stage live in a separate parallel thread. If a data has been finished being processed by a stage, the results will be send to the next stage. Since each stage runs in parallel, that stage can immediately take next data input if exist and process it immediately. This way, we can process multiple data at one time, thus increasing the throughput of your pipeline.

  • Mixed mode: This a mix of serial and parallel mode. You can put a serial pipeline inside a parallel one and vice versa. Parallel pipeline can improve the pipeline throughput but it is prone to larger latency. Mixing serial and parallel pipeline can very useful to optimize the latency and throughput of your pipeline further.

Whatever the mode you choose, you only need to focus on implementation of your own data processing codes and pack them into several stages. PyStream will handle the pipeline executions including the threads and the linking of stages for you.

Installation

You can install this package using pip.

pip install pystream-pipeline

If you want to build this package from source or develop it, we recommend you to use Poetry. First install Poetry by following the instructions in its documentation site. Then clone this repository and install all the dependencies. Poetry can help you do this and it will also setup a new virtual environment for you.

poetry install --with dev

To build the wheel file, you can run

poetry build

You can find the wheel file inside dist directory.

Sample Usage

API of PyStream can be found in this project documentation.

You can also access some examples:

  • See demo.ipynb to get the quick start of PyStream.
  • See how PyStream is used to increase the throughput of a vehicle environment mapping system in this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystream_pipeline-0.2.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

pystream_pipeline-0.2.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file pystream_pipeline-0.2.0.tar.gz.

File metadata

  • Download URL: pystream_pipeline-0.2.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.8.10 Windows/10

File hashes

Hashes for pystream_pipeline-0.2.0.tar.gz
Algorithm Hash digest
SHA256 df5564e2ff2f939c7b76c9bd3edf44c458823e22324f1462d8fee23b360a8ea7
MD5 80e77d274f43bc3583fa45a61631ebe6
BLAKE2b-256 92ed1c7e1aca15e351dfcc2bb08929141ae525372f426825a57447cb3483074d

See more details on using hashes here.

File details

Details for the file pystream_pipeline-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pystream_pipeline-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 618089e3f5a0e357e4698c11910352131c5f89f3f439e05eada40b0be76a0285
MD5 731ac455fb40559696e6c6d3a18a6bb3
BLAKE2b-256 a3db901e3bd218384636d60791fd4ec908c77b64f55643864684e65619c08dde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page