tanbih-pipeline

a pipeline framework for streaming processing

These details have not been verified by PyPI

Project links

Homepage

Project description

https://badge.fury.io/py/tanbih-pipeline.svg

a flexible stream processing framework supporting RabbitMQ, Pulsar, Kafka and Redis.

Features

at-least-once guaranteed with acknowledgement on every message
horizontally scalable through consumer groups
flow is controlled in deployment, develop it once, use it everywhere
testability provided with FILE and MEMORY input/output

Requirements

Python 3.8

Installation

$ pip install tanbih-pipeline

You can install the required backend dependencies with:

$ pip install tanbih-pipeline[redis]
$ pip install tanbih-pipeline[kafka]
$ pip install tanbih-pipeline[pulsar]
$ pip install tanbih-pipeline[rabbitmq]
$ pip install tanbih-pipeline[azure]

If you want to support all backends, you can:

$ pip install tanbih-pipeline[full]

Generator

Generator is to be used when developing a data source in our pipeline. A source will produce output without input. A crawler can be seen as a generator.

>>> from pipeline import Generator, Message
>>>
>>> class MyGenerator(Generator):
...     def generate(self):
...         for i in range(10):
...             yield {'id': i}
>>>
>>> generator = MyGenerator('generator', '0.1.0', description='simple generator')
>>> generator.parse_args("--kind MEM --out-topic test".split())
>>> generator.start()
>>> [r.get('id') for r in generator.destination.results]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Processor

Processor is to be used to process input. Modification will be in-place. A processor can produce one output for each input, or no output.

>>> from pipeline import Processor, Message
>>>
>>> class MyProcessor(Processor):
...     def process(self, msg):
...         msg.update({'processed': True})
...         return None
>>>
>>> processor = MyProcessor('processor', '0.1.0', description='simple processor')
>>> config = {'data': [{'id': 1}]}
>>> processor.parse_args("--kind MEM --in-topic test --out-topic test".split(), config=config)
>>> processor.start()
>>> [r.get('id') for r in processor.destination.results]
[1]

Splitter

Splitter is to be used when writing to multiple outputs. It will take a function to generate output topic based on the processing message, and use it when writing output.

>>> from pipeline import Splitter, Message
>>>
>>> class MySplitter(Splitter):
...     def get_topic(self, msg):
...         return '{}-{}'.format(self.destination.topic, msg.get('id'))
...
...     def process(self, msg):
...         msg.update({
...             'processed': True,
...         })
...         return None
>>>
>>> splitter = MySplitter('splitter', '0.1.0', description='simple splitter')
>>> config = {'data': [{'id': 1}]}
>>> splitter.parse_args("--kind MEM --in-topic test --out-topic test".split(), config=config)
>>> splitter.start()
>>> [r.get('id') for r in splitter.destinations['test-1'].results]
[1]

Usage

Writing a Worker

Choose Generator, Processor or Splitter to subclass from.

Environment Variables

Application accepts following environment variables:

environment variable	command line argument	options
PIPELINE	–kind	KAFKA, PULSAR, FILE
PULSAR	–pulsar	pulsar url
TENANT	–tenant	pulsar tenant
NAMESPACE	–namespace	pulsar namespace
SUBSCRIPTION	–subscription	pulsar subscription
KAFKA	–kafka	kafka url
GROUPID	–group-id	kafka group id
INTOPIC	–in-topic	topic to read
OUTTOPIC	–out-topic	topic to write to

Custom Code

Define add_arguments to add new arguments to worker.

Define setup to run initialization code before worker starts processing messages. setup is called after command line arguments have been parsed. Logic based on options (parsed arguments) goes here.

Options

Errors

The value None above is error you should return if dct or dcts is empty. Error will be sent to topic errors with worker information.

Contribute

Use pre-commit to run black and flake8

Credits

Yifan Zhang (yzhang at hbku.edu.qa)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.12.18

Sep 28, 2024

0.12.17

Sep 28, 2024

0.12.16

Sep 28, 2024

0.12.15

Sep 28, 2024

0.12.14

Sep 28, 2024

0.12.13

Oct 8, 2023

0.12.11

Aug 7, 2023

0.12.10

May 18, 2023

0.12.9

Jun 28, 2022

0.12.8

Jun 21, 2022

0.12.7

Jun 15, 2022

0.12.6

May 26, 2022

0.12.5

May 25, 2022

0.12.4

May 18, 2022

0.12.1

Mar 20, 2022

0.11.33

Dec 23, 2021

0.11.32

Dec 16, 2021

0.11.31

Dec 9, 2021

0.11.30

Dec 6, 2021

0.11.28

Nov 28, 2021

0.11.27

Nov 23, 2021

0.11.22

Nov 8, 2021

0.11.21

Oct 12, 2021

0.11.20

Oct 11, 2021

0.11.19

Oct 11, 2021

0.11.18

Oct 11, 2021

0.11.17

Jul 8, 2021

0.11.15

Jul 1, 2021

0.11.14

Jul 1, 2021

0.11.13

Jul 1, 2021

0.11.12

Jul 1, 2021

0.11.11

Jul 1, 2021

0.11.10

Jul 1, 2021

0.11.9

Jun 8, 2021

0.11.8

May 25, 2021

0.11.7

May 20, 2021

0.11.6

May 20, 2021

0.11.5

May 20, 2021

0.11.4

May 20, 2021

0.11.3

May 19, 2021

0.11.2

May 9, 2021

0.11.1

May 6, 2021

0.11.0

May 2, 2021

0.10.3

Sep 13, 2021

0.10.2

Sep 13, 2021

0.10.1

Mar 22, 2021

This version

0.10.0

Mar 17, 2021

0.9.2

Mar 12, 2021

0.9.1

Jan 26, 2021

0.8.7

Dec 23, 2020

0.8.6

Dec 23, 2020

0.8.5

Dec 21, 2020

0.8.4

Dec 16, 2020

0.8.3

Dec 16, 2020

0.8.2

Dec 16, 2020

0.8.1

Dec 14, 2020

0.7.6

Dec 10, 2020

0.7.5

Dec 2, 2020

0.7.4

Oct 27, 2020

0.7.3

Oct 14, 2020

0.7.2

Oct 11, 2020

0.7.0

Aug 4, 2020

0.6.1

Jul 30, 2020

0.6.0

Jul 30, 2020

0.5.4

Jul 29, 2020

0.5.3

Jul 26, 2020

0.5.2

Jul 26, 2020

0.5.1

Jul 25, 2020

0.5.0

Jul 24, 2020

0.4.3

Jul 19, 2020

0.4.2

Jul 7, 2020

0.4.1

Jul 7, 2020

0.4.0

Jul 7, 2020

0.3.3

Jul 7, 2020

0.3.2

Jul 6, 2020

0.3.1

Jun 28, 2020

0.3.0

Jun 28, 2020

0.2.0

Jun 28, 2020

0.1.4

Jun 24, 2020

0.1.3

Jun 23, 2020

0.1.1

Jun 7, 2020

0.1.0

Jun 3, 2020

0.0.3

May 22, 2020

0.0.2

May 21, 2020

0.0.1

Apr 28, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tanbih-pipeline-0.10.0.tar.gz (227.4 kB view details)

Uploaded Mar 17, 2021 Source

Built Distribution

tanbih_pipeline-0.10.0-py3-none-any.whl (543.5 kB view details)

Uploaded Mar 17, 2021 Python 3

File details

Details for the file tanbih-pipeline-0.10.0.tar.gz.

File metadata

Download URL: tanbih-pipeline-0.10.0.tar.gz
Upload date: Mar 17, 2021
Size: 227.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.2

File hashes

Hashes for tanbih-pipeline-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`dc78922667325b0ef91a52170b48cf833089c87dcae31c3385ab934ea3225aff`
MD5	`b4f843cf3841ae664537ac5b62572e61`
BLAKE2b-256	`2a0ee8234940b401cf54d1a9d878fa3dbf6e17eb187d669d25ef40ba17bedbb8`

See more details on using hashes here.

File details

Details for the file tanbih_pipeline-0.10.0-py3-none-any.whl.

File metadata

Download URL: tanbih_pipeline-0.10.0-py3-none-any.whl
Upload date: Mar 17, 2021
Size: 543.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.2

File hashes

Hashes for tanbih_pipeline-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45e1f029da2c50e722a5247e30e14c7476cac3325cf30e3e3af0039d2eac2afc`
MD5	`f58fe7bc3bbcfaa82071d34171975e0d`
BLAKE2b-256	`6f904d06b08e897d286330dcb86e8d55a2a9fa374252db9668ac3eac98b2b885`