Skip to main content

Pyelt is a DDL and ETL framework for creating and filling data vault datawarehouses on a postgress database.

Project description

Usage

This example will create and fill the historical staging area:

pipeline = Pipeline(config)
pipe = pipeline.get_or_create_pipe('test_source', source_config)

source_file = CsvFile(get_root_path() + '/sample_data/patienten1.csv', delimiter=';')
source_file.reflect()
source_file.set_primary_key(['patientnummer'])
mapping = SourceToSorMapping(source_file, 'persoon_hstage', auto_map=True)
pipe.mappings.append(mapping)

pipeline.run()

More examples can be found on the GitHub repository of NL Healthcare.

Introduction

Pyelt is a Python DDL and ETL framework for creating and loading Data Vaults for datawarehousing.

Pyelt supports several data-layers, including Source-of-Record (SOR), Raw datavault (RDV), Business datavault (BDV) and Datamarts (DM)

Pyelt can import data from several different source systems such as fixed length files, csv-files, and different databases.

Pyelt is developed to run on a postgreSQL database.

Pyelt uses the SQLAlchemy.core only for the connection and for reflection. All other SQL statements (ddl, copy, insert and update statements) are created by the pyelt framework itself.

Write your own mappings to transfer and transform data from sources via staging into the data ware house.

Background

The pyelt framework is presently under development at NL Healthcare, with the aim to implement our next-generation datawarehouse (DWH2.0). It serves as the foundation for our work in the area of clinical business intelligence (CBI) and machine-learning.

Architectural cornerstones of this project are:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pyelt-0.9.4.6a0.zip (71.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page