Skip to main content

Citrine Informatics ETL pipeline

Project description

Piperoni

Piperoni is a lightweight ETL framework for any data type, which allows you to make, track, and visualize atomic data transformations. Unlike some ETL tools, Piperoni relies on in-memory transformation, and thus is ideal for manipulating complex, diverse non-"big"-data.

Piperoni allows you to make and track atomic data transformations, ensures expected types are being passed from transformation to transformation, and allows you to easily see the state of the data at any point in time. Piperoni is a great tool for collaborative data pipelines, where visibility into data transformations is key.

Getting Started

piperoni is a framework for ETL and data pipeline work. To get started, first install piperoni:

pip install git+ssh://git@github.com/CitrineInformatics/piperoni.git

Documentation

Detailed instructions on installation and usage can be found in the complete piperoni docs

Contributing

The following best practices are required for contributing.

In this repo, we follow PEP8 standards (using Black) and include Docstrings in all of work.

All functions should have unit testing.

Best Practices

  • Never use branching code in a Pipeline (e.g. if, else) without an explicit warning or failure. Particularly, do not use branching if the branches give rise to same or similar data.

  • Do not use deepcopy() in any operators; this will cause unexpected behavior.

  • Keep transforms atomic! This is the reason for Piperoni. Don't be lazy.

  • Stuck? Piperoni logs every transformation! Just set it to debug mode!

  • Have intermediate states be optionally output by using Checkpoints

  • Do not use nestled Types when defining Types in your Operators (e.g. Dict not Dict[str, str])

  • Avoid hidden-states / adopt functional programming practices whenever possible

  • Avoid multiple versions of files for optioning. Adopt argparse or similar instead whenever possible.

  • Use named variables and either avoid or fill in optional variables in function calls.

  • Do not hard code column names or similar, even when the function only ever applies to a single column or instance.

  • Have a trusted reference. Always compare to trusted reference after changes to the pipeline. Update the reference as needed.

Flagging Bugs and Requesting New Features

We funnel Bugs and Feature requests through Github issues. Create a new issue and select Bug Report or Feature Request (If you have neither a bug or feature request, open a regular issue). Add a concise title, fill in the template, and submit the issue.

Citations

Example Band Gap data used in the example are from: Strehlow, W. H., & Cook, E. L. (1973). Compilation of energy band gaps in elemental and binary compound semiconductors and insulators. Journal of Physical and Chemical Reference Data, 2(1), 163-200.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piperoni-3.0.7.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

piperoni-3.0.7-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file piperoni-3.0.7.tar.gz.

File metadata

  • Download URL: piperoni-3.0.7.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.7

File hashes

Hashes for piperoni-3.0.7.tar.gz
Algorithm Hash digest
SHA256 13d6a4ad7eaaa4e8fa2d65f5997e418ba417ccf42e42998ecd3131fdb4e008a5
MD5 fa9d84c6dbd994f17dbeada92c2846b8
BLAKE2b-256 4318ee830ba56c447ead4db9e96dd960e0cdfb599dad671a24ee2e14fbb89313

See more details on using hashes here.

File details

Details for the file piperoni-3.0.7-py3-none-any.whl.

File metadata

  • Download URL: piperoni-3.0.7-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.7

File hashes

Hashes for piperoni-3.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 af9c5cb8ff1d07da4d676255a81e2aa1d31c82aa2026063cd9edebc0e913386f
MD5 8ccb623d1b667f1d603367fc3d60e5bd
BLAKE2b-256 c3db9a877b50893fcdbb4fc63fe68326871922f728e2d2154144ace97c75f0db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page