A package to write schema-aware data pipelines
This is a a package to write data pipelines for data science systematically in Python. Thanks for checking it out.
Check out the very comprehensive documentation here.
The problem that this package solves
A major challenge in creating a robust data pipeline is guaranteeing interoperability between pipes: how do we guarantee that the pipe that someone wrote is compatible with others' pipe without running the whole pipeline multiple times until we get it right?
The solution that this package adopts
This package declares an API to define a stateful data transformation that gives
the developer the opportunity to declare what comes in, what comes out, and what states are modified
on each pipe and therefore the whole pipeline. Check out
pip install schemaflow
or, install the latest (recommended for now):
git clone https://github.com/jorgecarleitao/schemaflow cd schemaflow && pip install -e .
We provide one example that demonstrate the usage of SchemaFlow's API on developing an end-to-end pipeline applied to one of Kaggle's exercises.
To run it, download the data in that exercise to
examples/all/ and run
pip install -r examples/requirements.txt python examples/end_to_end_kaggle.py
You should see some prints to the console as well as the generation of 3 files at
examples/: two plots and one
pip install -r tests/requirements.txt python -m unittest discover
pip install -r docs/requirements.txt cd docs && make html && cd .. open docs/build/html/index.html
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size schemaflow-0.2.0-py3-none-any.whl (23.5 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size schemaflow-0.2.0.tar.gz (13.9 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for schemaflow-0.2.0-py3-none-any.whl