Skip to main content

SimpleETL - ETL Processing by Simple Specifications

Project description

SimpleETL er an ETL tool developed by FlexDanmark to easily handling processing of data from user-defined data sources and automatically generates a dimensionally modelled data warehouse.

SimpleETL is developed to work with the PostgreSQL DBMS backend with psycopg2 as database adapter.

Features

  • Automatically generates data warehouse dimensional model (star schema)

  • Can track changes of facts

  • User-defined automatic fact table partitioning

  • Handle deleted facts

  • Ensures data quality by type and value checking

  • Provides a wide range of default data types and allows user to define their own

Installation

SimpleETL can be installed in multiple ways. The simples is to install from pypi (https://pypi.org/project/simpleetl/):

$ pip install simpleetl

Dependencies

SimpleETL requires the psycopg2-binary and the pygrametl package for database PostgreSQL database connections and table handling.

Example usages

From the source repository multiple code examples can be found in the examples folder.

A simple example could be:

from simpleetl import FactTable, runETL, datatypes as dt

factobj = FactTable(schema="testschema", table="userdata",
                    migrate_updates=True,
                    # Updated to data will be processed. Can be set to False if only appending (will speed things up)
                    store_history=False,  # Create a seperate userdata_historic table for storing changes to facts.
                    track_last_updated=True,
                    # Adds an _updated attribute which keeps track of when data was last updated.
                    lookupatts=["userid"]  # List of attributes uniquely defining a fact
                    )

factobj.handle_deleted_rows("mark")
# Tells ETL to mark deleted rows from source with an _deleted timestamp attribute

factobj.add_column_mapping("userid", dt.bigint, "userid")
# Map userid from source data to database with same name

factobj.add_column_mapping("sys_username", dt.text, "username")
# Rename "sys_username" from source data to "username" in database

datafeeder = [{"userid": 42, "sys_username": "Jens"}, {"userid": 56, "sys_username": "Svend"}]
# datafeeder can be a generator or simple a iterable of dictionaries

stats = runETL(facttable=factobj, datafeeder=datafeeder,
               db_host="localhost", db_port="5432", db_name="test_db", db_user="dbuser", db_pass="dbpass")

Publications

Ove Andersen, Christian Thomsen, Kristian Torp: SimpleETL: ETL Processing by Simple Specifications. DOLAP 2018 http://ceur-ws.org/Vol-2062/paper10.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simpleetl-1.1.0-py3-none-any.whl (50.4 kB view details)

Uploaded Python 3

File details

Details for the file simpleetl-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: simpleetl-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for simpleetl-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 967f63f4b84d02f38098dba8d26c7218a22432b86795bc61d9cba41ba69df351
MD5 61e18087b22d923d33b44447ffc64894
BLAKE2b-256 4f819c699a2dd31c8ab6f9d6551270214f343d62ae34f7460f158d74cfcd6893

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page