Skip to main content

A python rule engine operating with data frames aimed at the financial services.

Project description

etlrules

Release Status CI Status Documentation Status

A python rule engine for applying transformations to dataframes.

ETL stands for Extract, Trasform, Load, which is a three step process to source the data from some data source (Extract), transform the data (Transform) and publish it to a final destination (Load).

Data transformation of tabular sets can be done in pure python with many dedicated python packages, the most widely recognized being pandas. The result of such transformations can be quite opaque with the logic difficult to read and understand, especially by non-coders. Even coders can struggle to understand certain transformations unless in-code documentation is added and even when documentation is available, the code change in ways which renders the documentation stale.

The etlrules package solves this by offering a set of simple rules which users can use to form a plan. The plan is a blueprint on how to transform the data. The plan can be saved to a yaml file, stored in a repo for version control or in a database for manipulation via UIs and then executed in a repeatable and predictable fashion. The rules in the plan can have names and extensive description acting as an embedded documentation as to what the rule is trying to achieve.

This data-driven way of operating on tabular data allows non-technical users to come up with data transformations which can automate various problems without the need to code. Workflows for managing change and version control can be put in place around the plans, allowing technical and non-technical users to collaborate on data transformations that can be scheduled to run periodically for solving real business problems.

High level concepts

Plan

A plan is a blueprint of how to perform extractions of tabular data, transformations of the data and how to load (ie write) the transformed data to its final destination.

A plan is a collection of rules, each of which operate on a dataframe (tabular data).

Rule

A rule is an operation performed on a dataframe. There are three types of rules: * Extract rules (aka read rules) - They will read an external data source (ie files, DBs, APIs endpoints) and bring the data into memory for processing * Transform rules - They will perform a transformation of the data (ie add a new column, modify an existing column, join columns, aggregate) * Load rules (aka write rules) - They will write the output into an external storage (ie files, DBs, APIs endpoints)

Rule engine

The component that takes a plan and executes it (rule by rule) based on an input.

Rule data

The structure that holds together any input dataframes, temporary results and the final output of a rule engine execution of a plan. The rule data can have some input dataframes or they can start as empty canvases, with the plan performing extractions/reading of data that it needs.

Backend

The underlying dataframe library to use for executing the plan. For example: pandas, vaex, polars, etc. At the moment, only pandas is supported.

Documentation

https://ciprianmiclaus.github.io/etlrules/

License

Free software: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etlrules-0.2.0.tar.gz (66.8 kB view details)

Uploaded Source

Built Distribution

etlrules-0.2.0-py3-none-any.whl (67.1 kB view details)

Uploaded Python 3

File details

Details for the file etlrules-0.2.0.tar.gz.

File metadata

  • Download URL: etlrules-0.2.0.tar.gz
  • Upload date:
  • Size: 66.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for etlrules-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7ed8eb097c29e4f6d0dda879ead04a5a301dff5c1f6e2a7cdfd9d10bc97f9196
MD5 4dbfe7c8464053107a657390d8327098
BLAKE2b-256 415f1c14b3dd1299db1b6de9ce46452f7e6a4d598cd2052cb7090b5934233b4a

See more details on using hashes here.

File details

Details for the file etlrules-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: etlrules-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for etlrules-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c99792e937c8357134ab74cc12c4a54725ef061c7aefbe784b0e42e6a8b5c85c
MD5 15a95556093d5f97f5b110c38f2fdfb0
BLAKE2b-256 3e6f627dec52ca880d3a75e3da0d73243a7438fd322f3380060a7ffee32b45d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page