buildflow is a unified **batch** and **streaming** framework that turns any python function into a scalable data pipeline that can read from our supported IO resources.
Project description
buildflow
buildflow is a unified batch and streaming framework that turns any python function into a scalable data pipeline that can read from our supported IO resources.
Key Features:
- Production Ready - Ready made IO connectors let users focus on processing data instead of reading and writing data
- Fast - Scalable multiprocessing powered by Ray
- Easy to learn- Get started with 2 lines of code
Quick Start
Install the framework
pip install buildflow
Import the framework and create a flow.
from buildflow import Flow
import buildflow
flow = Flow()
Add the flow.processor
decorator to your function to attach IO.
QUERY = 'SELECT * FROM `table`'
@flow.processor(source=buildflow.BigQuery(query=QUERY))
def process(bigquery_row):
...
Use flow.run()
to kick off your pipeline.
flow.run()
Examples
All samples can be found here.
Streaming pipeline reading from Google Pub/Sub and writing to BigQuery.
import buildflow
from buildflow import Flow
flow = Flow()
# Turn your function into a stream processor
@flow.processor(
source=buildflow.PubSub(subscription='my_subscription'),
sink=buildflow.BigQuery(table_id='project.dataset.table'),
)
def stream_process(pubsub_message):
...
flow.run()
Streaming pipeline reading from / writing to Google Pub/Sub.
import buildflow
from buildflow import Flow
flow = Flow()
# Turn your function into a stream processor
@flow.processor(
source=buildflow.PubSub(subscription='my_subscription'),
sink=buildflow.PubSub(topic='my_topic'),
)
def stream_process(pubsub_message):
...
flow.run()
Batch pipeline reading and writing to BigQuery.
import buildflow
from buildflow import Flow
flow = Flow()
QUERY = 'SELECT * FROM `project.dataset.input_table`'
@flow.processor(
source=buildflow.BigQuery(query=QUERY),
sink=buildflow.BigQuery(table_id='project.dataset.output_table'),
)
def process(bigquery_row):
...
flow.run()
Batch pipeline reading from BigQuery and returning output locally.
import buildflow
from buildflow import Flow
flow = Flow()
QUERY = 'SELECT * FROM `table`'
@flow.processor(source=buildflow.BigQuery(query=QUERY))
def process(bigquery_row):
...
output = flow.run()
process_rows = output['process']['local']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for buildflow-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f268e4bf51ae0d1d08bd351b9b22777cb3c2bacd1cfe18fadc55b545f9764556 |
|
MD5 | bf32ae3f44490714f117937347a9fcf2 |
|
BLAKE2b-256 | da9e2f180e579e98ddc1bbd4964885d7868a79a2f89e9922f5a11c3ab5b8a264 |