Pandas pipeline in graphviz
Project description
pandas_pipeline_graphviz
— Pandas pipeline in graphviz
Python package to build a nice explanative schema of a data processing pipeline in pandas.
It's heavily inspired by dask's [`.visualize` method](https://docs.dask.org/en/latest/graphviz.html), but improved with 2 useful features:
- visualize columns names in data nodes
- highlight created columns at each task
## Installation
### Pip
``
bash $ pip install pandas_pipeline_graphviz
``
### Manual installation
- git clone
- `python setup.py`
## Usage
### Disclaimer: it's a hack
**⚠️ WARNING: Hack!**
There are no reliable methods to get variables names, either as input, or as output are quite hacky, as shown in this [stackoverflow thread about "How to get the original variable name of variable passed to a function"](https://stackoverflow.com/questions/2749796/how-to-get-the-original-variable-name-of-variable-passed-to-a-function).
To build the graph, this packages makes use of:
- to get the names of input dataframes: the package uses `globals()`, doing a comparison between the input dataframes and all the variables available in the global variables.
- to detect output dataframe name: the package uses `inspect.stack()`, gathering the code lines calling the function and parsing it to find the output. Currently it supports only single-output transformations.
Both methods should be considered as experimental and the behavior of the decorator is expected to break easily if it's not used as presented in the example.
### Conditions for use:
- do not use several decorators on your function, only this decorator, otherwise it will break the output dataframe name detection through `inspect.stack()`
- use only single output transformation functions, i.e. function which return only 1 dataframe.
### Example
See [examples folder](examples) in the repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pandas-pipeline-graphviz-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64562684a63e3747b007127b951cbe61994458330b59341e862afc55ea5d2e9b |
|
MD5 | 423ea29cc4fb2ddc74ab5280e7ce12e6 |
|
BLAKE2b-256 | f2de505191dee995f5ecd16317e8cb823be6070f1d65fcd7912f59fe1c3438a5 |
Close
Hashes for pandas_pipeline_graphviz-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb1a05ca12fa8670e5527a4c7f3285692ba4cb8b0bbc3e3ecd3ca0bf659eead4 |
|
MD5 | 839c61444af1e4e5854c3ed3a1cf5be2 |
|
BLAKE2b-256 | 636c8be6f59042cff51cca84bd4d0c71e32333e69fbb161f8f7098176a4f7a37 |