Skip to main content

Lightweight Data Pipeline for with code-based stage cacheing

Project description

Lightweight Data Pipeline (LWDP)

LWDP attempts to fill the niche for structuring pure-Python data transformations, with robust data- and code-based-cacheing across a few locales.

Because sometimes Spark or Dask or AWS Glue or anything other than a 5kb library and some dumbly hashed files is just too much.

LWDP is meant for the case where you're doing a few data transformations, possibly across multiple input file types (csvs, Excel, parquet, etc.). Each of these files can generally (although not strictly) be held in memory. 25 csvs with structured transformations that you'd like to keep organized and possibly streamline with cacheing? LWDP could be the answer.

If the data changes or your code changes, you want to be able to refresh the data pipeline once - and, ideally, only those parts of the data pipeline who need to be refreshed.

Installation

You should be able to install from PyPi with pip install lwdp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lwdp-0.0.1.tar.gz (4.1 kB view hashes)

Uploaded Source

Built Distribution

lwdp-0.0.1-py3-none-any.whl (5.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page