Pyelt is a DDL and ETL framework for creating and filling data vault datawarehouses on a postgress database.
This example will create and fill the historical staging area:
pipeline = Pipeline(config) pipe = pipeline.get_or_create_pipe('test_source', source_config) source_file = CsvFile(get_root_path() + '/sample_data/patienten1.csv', delimiter=';') source_file.reflect() source_file.set_primary_key(['patientnummer']) mapping = SourceToSorMapping(source_file, 'persoon_hstage', auto_map=True) pipe.mappings.append(mapping) pipeline.run()
More examples can be found on the GitHub repository of NL Healthcare.
Pyelt is a Python DDL and ETL framework for creating and loading Data Vaults for datawarehousing.
Pyelt supports several data-layers, including Source-of-Record (SOR), Raw datavault (RDV), Business datavault (BDV) and Datamarts (DM)
Pyelt can import data from several different source systems such as fixed length files, csv-files, and different databases.
Pyelt is developed to run on a postgreSQL database.
Pyelt uses the SQLAlchemy.core only for the connection and for reflection. All other SQL statements (ddl, copy, insert and update statements) are created by the pyelt framework itself.
Write your own mappings to transfer and transform data from sources via staging into the data ware house.
The pyelt framework is presently under development at NL Healthcare, with the aim to implement our next-generation datawarehouse (DWH2.0). It serves as the foundation for our work in the area of clinical business intelligence (CBI) and machine-learning.
Architectural cornerstones of this project are: