A package to write schema-aware data pipelines

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SchemaFlow

This is a a package to write data pipelines for data science systematically in Python. Thanks for checking it out.

Check out the very comprehensive documentation here.

The problem that this package solves

A major challenge in creating a robust data pipeline is guaranteeing interoperability between pipes: how do we guarantee that the pipe that someone wrote is compatible with others' pipe without running the whole pipeline multiple times until we get it right?

The solution that this package adopts

This package declares an API to define a stateful data transformation that gives the developer the opportunity to declare what comes in, what comes out, and what states are modified on each pipe and therefore the whole pipeline. Check out tests/test_pipeline.py or examples/end_to_end_kaggle.py

Install

pip install schemaflow

or, install the latest (recommended for now):

git clone https://github.com/jorgecarleitao/schemaflow
cd schemaflow && pip install -e .

Run examples

We provide one example that demonstrate the usage of SchemaFlow's API on developing an end-to-end pipeline applied to one of Kaggle's exercises.

To run it, download the data in that exercise to examples/all/ and run

pip install -r examples/requirements.txt
python examples/end_to_end_kaggle.py

You should see some prints to the console as well as the generation of 3 files at examples/: two plots and one submission.txt.

Run tests

pip install -r tests/requirements.txt
python -m unittest discover

Build documentation

pip install -r docs/requirements.txt
cd docs && make html && cd ..
open docs/build/html/index.html

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.0

Sep 15, 2018

0.1.0

Sep 8, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schemaflow-0.2.0.tar.gz (13.9 kB view details)

Uploaded Sep 15, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schemaflow-0.2.0-py3-none-any.whl (23.5 kB view details)

Uploaded Sep 15, 2018 Python 3

File details

Details for the file schemaflow-0.2.0.tar.gz.

File metadata

Download URL: schemaflow-0.2.0.tar.gz
Upload date: Sep 15, 2018
Size: 13.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.7.0

File hashes

Hashes for schemaflow-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1333fc45a8f6ffb7170e0d803b45bfbfc0d09c1a50380718bfb1fffea2bb931d`
MD5	`b1854ae7d328993317e61894edb6b3a0`
BLAKE2b-256	`9e2e7a630cb63a3f0aa05e30ecb308b29143a4f5706688421781f94421907ad5`

See more details on using hashes here.

File details

Details for the file schemaflow-0.2.0-py3-none-any.whl.

File metadata

Download URL: schemaflow-0.2.0-py3-none-any.whl
Upload date: Sep 15, 2018
Size: 23.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.7.0

File hashes

Hashes for schemaflow-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db3f4930ab135176b85ba01067b608959f1bcb6ec64bbcbbe510b3ed73f3983c`
MD5	`bfc8f9a9559593e1bc58a79adf373e9a`
BLAKE2b-256	`3fb378a1499748782bac31bff0208faca37da748839d04805aaf22ef8454ba1a`

See more details on using hashes here.

schemaflow 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SchemaFlow

The problem that this package solves

The solution that this package adopts

Install

Run examples

Run tests

Build documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes