List of operators using the pandas module for processing the input

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

Pandas DataFrame Operators

Are a set of operators that can be implemented on SAP Data Hub/SAP Data Intelligence. These operators help to create Pandas DataFrames from CSV-strings or byte-encoded data.

Example graph with creating DataFrames, sampling, joining, selecting and creating CSV: Example pipeline: Create POS

The list of operators are constantly growing and will never be complete. In any case it should provide you the idea of how to develop quickly similar pandas operators that suits your requirements. At the end of the README.md you find a documention with common features and some practices of how it was developed.

More on the pandas project and the benefits it provides to high-performance data structures and analysis you find at https://pandas.pydata.org.

All operators have been developed locally and tested both locally and on an SAP Data Intelligence instance. For more information of how I have done it you find at sdi_utils and my blog on SAP Community platform.

Requirements

In order to be able to deploy and run the examples, the following requirements need to be fulfilled:

SAP Data Hub 2.3 or later installed on a supported platform or SAP Data Hub, trial edition 2.3
A docker-image with pandas package installed

Download and Installation

In the solution-folder you find the ready-to-import operators that will be stored under the path:

/files/vflow/subengines/com/sap/python36/operators/pandas

Examples

In the github folder example-graphs you find an example of how to use the operators.

Known Issues

Currently there are no known issues with the operators but nonetheless although all operators come with test cases and the code has limited complexities there might be errors that are not discovered yet. Notes of failing cases are well-appreciated.

How to get support

If you need help or in case you found a bug please open a Github Issue.

How to run

Import lastest release in /solution/PandasDataFrameOperators-0.0.x.zip via SAP Data Hub System Management -> Files -> Import Solution

License

This project is licensed under the MIT License

Documentation

Each operator folder has a README that should describe the behaviour of the operator.

Local Development Support

To work with the IDE of your choice and to run unit tests, you may start the development locally and do the appropriate tests before deploying the scripts in a SAP Data Hub / SAP Data Intelligence cluster. For doing this for all scripts supporting features are provided. There is also a hint for a simulation of a pipeline. Examples are given in the folder of /pipelines.

Basic Architecture

The communication is based on message.DataFrame where the body is linked to the DataFrame and the attributes provides some basic information like

number of columns
number of rows
index
column names
memory usage
data types of columns

The ports of communincating between pandas operators are type message.DataFrame to ensure a test of connecting operators on modeler level.

In addition there is a port 'log' that collects all logging statements and provided it as string.

Some common features

Memory

Because memory usage for big data is critical, fromCSV supports to select columns and to downcast datatypes. Open is the implementation of datatype category to reduce the memory of the extremely memory demanding strings. It is assumed that all data processing with the pandas operators runs in the same container. For crossing pods a streaming needs to be implemented or an intermediate saving of the results in an object store or a database and then reading it from other pods.

Communication between operators

For the communication the data type message is used where

attributes contains a basic profile of the DataFrame i(e.g. name, last_operator, number of rows and columns, message usage, data types, column names, ...).
body of the message contains the byte-encoded DataFrame.

The alternative of using a custom type was discarded because it is not supported within Python operators by providing and supporting the pre-defined structure. The only benefit is that in the Modeler the compatibility of the connections are checked.

Within a Python operator you can access the attributes of the message as a dictionary where as the body stores the pointer to the DataFrame.

Most of the di_pandas operators have 1 input dataport and 2 outputdata ports. The nomenclature is DataFrameMsg for the data message and Info for channeling infos to a terminal or a logging file for monitoring the graph behaviour while developing.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.0.37

Feb 2, 2020

0.0.35

Jan 30, 2020

0.0.33

Jan 28, 2020

0.0.29

Jan 26, 2020

0.0.28

Jan 26, 2020

0.0.27

Jan 14, 2020

0.0.26

Jan 14, 2020

0.0.11

Dec 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdi_pandas-0.0.37.tar.gz (28.6 kB view details)

Uploaded Feb 2, 2020 Source

File details

Details for the file sdi_pandas-0.0.37.tar.gz.

File metadata

Download URL: sdi_pandas-0.0.37.tar.gz
Upload date: Feb 2, 2020
Size: 28.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for sdi_pandas-0.0.37.tar.gz
Algorithm	Hash digest
SHA256	`ac594db7d928cee3f7bf93e6eb1c8bb082434d4c6433a4f8aaef68433505bb4b`
MD5	`3fa58c685fcc82930520dc3dbc3b4504`
BLAKE2b-256	`f144689164c4c1f148c9adf41a316877f9d7f5baca3c73760a974abea94f2527`

See more details on using hashes here.

sdi-pandas 0.0.37

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pandas DataFrame Operators

Requirements

Download and Installation

Examples

Known Issues

How to get support

How to run

License

Documentation

Local Development Support

Basic Architecture

Some common features

Memory

Communication between operators

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes