Tiny Block Operations for Data Pipelines
Project description
tiny-blocks
Tiny Blocks to build large and complex ETL data pipelines!
Tiny-Blocks is a library for data engineering operations.
Each pipeline is made out of tiny-blocks glued with the >>
operator.
This library relies on a fundamental streaming abstraction consisting of three
parts: extract, transform, and load. You can view a pipeline
as an extraction, followed by zero or more transformations, followed by a sink.
Visually, this looks like:
extract -> transform1 -> transform2 -> ... -> transformN -> load
You can also fan-in
, fan-out
for more complex operations.
extract1 -> transform1 -> |-> transform2 -> ... -> | -> transformN -> load1
extract2 ---------------> | | -> load2
Tiny-Blocks use generators to stream data. Each chunk is a Pandas DataFrame.
The chunksize
or buffer size is adjustable per pipeline.
Installation
Install it using pip
pip install tiny-blocks
Basic usage
from tiny_blocks.extract import FromCSV
from tiny_blocks.transform import Fillna
from tiny_blocks.load import ToSQL
# ETL Blocks
from_csv = FromCSV(path='/path/to/source.csv')
fill_na = Fillna(value="Hola Mundo")
to_sql = ToSQL(dsn_conn='psycopg2+postgres://...', table_name="sink")
# Pipeline
from_csv >> fill_na >> to_sql
Examples
For more complex examples please visit the notebooks' folder.
Documentation
Please visit this link for documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tiny_blocks-0.1.14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f45d6a4c33486d8be29c8d11dc20c28f09a7a3febf0e9c0964d70d8fba13dddd |
|
MD5 | 8033732785225f4eecfee629741663c2 |
|
BLAKE2b-256 | 92f285da74af760cd38e2c878e5afc7fa314b3d01ad9b331f5c1672e6e5634dd |