SimpleETL - ETL Processing by Simple Specifications
Project description
SimpleETL er an ETL tool developed by FlexDanmark to easily handling processing of data from user-defined data sources and automatically generates a dimensionally modelled data warehouse.
SimpleETL is developed to work with the PostgreSQL DBMS backend with psycopg2 as database adapter.
Features
- Automatically generates data warehouse dimensional model (star schema)
- Can track changes of facts
- User-defined automatic fact table partitioning
- Handle deleted facts
- Ensures data quality by type and value checking
- Provides a wide range of default data types and allows user to define their own
Installation
SimpleETL can be installed in multiple ways. The simples is to install from pypi:
$ pip install simpleetl
Dependencies
SimpleETL requires the psycopg2-binary and the pygrametl package for database PostgreSQL database connections and table handling.
Example usages
From the source repository multiple code examples can be found in the examples folder.
A simple example could be:
from simpleetl import FactTable, runETL, datatypes as dt factobj = FactTable(schema="testschema", table="userdata", migrate_updates=True, # Updated to data will be processed. Can be set to False if only appending (will speed things up) store_history=False, # Create a seperate userdata_historic table for storing changes to facts. track_last_updated=True, # Adds an _updated attribute which keeps track of when data was last updated. lookupatts=["userid"] # List of attributes uniquely defining a fact ) factobj.handle_deleted_rows("mark") # Tells ETL to mark deleted rows from source with an _deleted timestamp attribute factobj.add_column_mapping("userid", dt.bigint, "userid") # Map userid from source data to database with same name factobj.add_column_mapping("sys_username", dt.text, "username") # Rename "sys_username" from source data to "username" in database datafeeder = [{"userid": 42, "sys_username": "Jens"}, {"userid": 56, "sys_username": "Svend"}] # datafeeder can be a generator or simple a iterable of dictionaries stats = runETL(facttable=factobj, datafeeder=datafeeder, db_host="localhost", db_port="5432", db_name="test_db", db_user="dbuser", db_pass="dbpass")
Publications
Ove Andersen, Christian Thomsen, Kristian Torp: SimpleETL: ETL Processing by Simple Specifications. DOLAP 2018 http://ceur-ws.org/Vol-2062/paper10.pdf
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for simpleetl-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aedd003058a92863095c46f379b887d69169d8cdfc0d3a05d73d460024282c6e |
|
MD5 | bd1e9a8d216e8a4921203ead95c77510 |
|
BLAKE2-256 | 81a2b63b22b4668171f5bd7b5b0e555165bef933a997870075e6a1df45044bb9 |