buildflow is a unified **batch** and **streaming** framework that turns any python function into a scalable data pipeline that can read from our supported IO resources.

These details have not been verified by PyPI

Project description

buildflow

buildflow is a unified batch and streaming framework that turns any python function into a scalable data pipeline that can read from our supported IO resources.

Key Features:

Production Ready - Ready made IO connectors let users focus on processing data instead of reading and writing data
Fast - Scalable multiprocessing powered by Ray
Easy to learn- Get started with 2 lines of code

Quick Start

Install the framework

pip install buildflow

Import the framework and create a flow.

from buildflow import Flow
import buildflow

flow = Flow()

Add the flow.processor decorator to your function to attach IO.

QUERY = 'SELECT * FROM `table`'
@flow.processor(source=buildflow.BigQuery(query=QUERY))
def process(bigquery_row):
    ...

Use flow.run() to kick off your pipeline.

flow.run()

Examples

All samples can be found here.

Streaming pipeline reading from Google Pub/Sub and writing to BigQuery.

import buildflow
from buildflow import Flow

flow = Flow()

# Turn your function into a stream processor
@flow.processor(
   source=buildflow.PubSub(subscription='my_subscription'),
   sink=buildflow.BigQuery(table_id='project.dataset.table'),
)
def stream_process(pubsub_message):
   ...

flow.run()

Streaming pipeline reading from / writing to Google Pub/Sub.

import buildflow
from buildflow import Flow

flow = Flow()

# Turn your function into a stream processor
@flow.processor(
   source=buildflow.PubSub(subscription='my_subscription'),
   sink=buildflow.PubSub(topic='my_topic'),
)
def stream_process(pubsub_message):
   ...

flow.run()

Batch pipeline reading and writing to BigQuery.

import buildflow
from buildflow import Flow

flow = Flow()

QUERY = 'SELECT * FROM `project.dataset.input_table`'
@flow.processor(
    source=buildflow.BigQuery(query=QUERY),
    sink=buildflow.BigQuery(table_id='project.dataset.output_table'),
)
def process(bigquery_row):
    ...

flow.run()

Batch pipeline reading from BigQuery and returning output locally.

import buildflow
from buildflow import Flow

flow = Flow()

QUERY = 'SELECT * FROM `table`'
@flow.processor(source=buildflow.BigQuery(query=QUERY))
def process(bigquery_row):
    ...

output = flow.run()
process_rows = output['process']['local']

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Dec 21, 2023

0.3.0

Dec 9, 2023

0.2.2

Aug 24, 2023

0.2.1

Aug 22, 2023

0.2.0

Aug 17, 2023

0.1.2

Jun 9, 2023

0.1.0

Jun 7, 2023

0.0.15.dev1 pre-release

Apr 25, 2023

0.0.15.dev0 pre-release

Apr 25, 2023

0.0.14

Apr 21, 2023

0.0.13

Apr 12, 2023

0.0.13.dev1 pre-release

Apr 21, 2023

0.0.13.dev0 pre-release

Apr 20, 2023

0.0.12

Apr 7, 2023

0.0.11

Apr 6, 2023

0.0.10

Mar 29, 2023

0.0.9

Mar 16, 2023

0.0.8

Mar 16, 2023

0.0.7

Mar 15, 2023

0.0.6

Mar 15, 2023

0.0.5

Mar 14, 2023

This version

0.0.4

Mar 9, 2023

0.0.3

Mar 3, 2023

0.0.2

Feb 26, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buildflow-0.0.4.tar.gz (25.7 kB view hashes)

Uploaded Mar 9, 2023 Source

Built Distribution

buildflow-0.0.4-py3-none-any.whl (37.0 kB view hashes)

Uploaded Mar 9, 2023 Python 3

Hashes for buildflow-0.0.4.tar.gz

Hashes for buildflow-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`096ad1714dc159b9dccd5192ce48186142dc88d1de04f1e5de452c92d5e9764b`
MD5	`0273c7785da012fb28668f3760a849dd`
BLAKE2b-256	`444e077c63324a539f2bca1f0ee901d9db4baff8b44187213d20423d03b2ecf6`

Hashes for buildflow-0.0.4-py3-none-any.whl

Hashes for buildflow-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f268e4bf51ae0d1d08bd351b9b22777cb3c2bacd1cfe18fadc55b545f9764556`
MD5	`bf32ae3f44490714f117937347a9fcf2`
BLAKE2b-256	`da9e2f180e579e98ddc1bbd4964885d7868a79a2f89e9922f5a11c3ab5b8a264`