Package for logging and database interfacing using SQLAlchemy and SQLModels
Project description
from nrcan_etl_toolbox.etl_toolbox.reader.source_readers import ExcelReader
NRCAN ETL Toolbox
Pour la version française de ce document, consultez README-fr.md.
etl-toolbox is a Python toolkit designed to simplify Extract, Transform, and Load (ETL) data processes. This modular toolkit offers several specialized components for different aspects of ETL workflows.
Components
etl_logging
Specialized logging module for ETL processes, allowing simple configuration and efficient log analysis.
etl_toolbox
Collection of tools for reading data from various sources. It includes readers for different file formats and databases, facilitating data integration in ETL processes:
- Data Readers: CSV, Excel, GeoPackage, JSON, PostGIS, Shapefile
database
Interfaces and ORM for interacting with different database systems:
- Database Interfaces: Abstract object handlers for database interactions
- ORM: Object-relational mappings to simplify data access
Installation
Install the package via Poetry:
poetry install
Or by creating a distribution:
poetry build
pip install dist/nrcan_etl_toolbox-*.whl
Usage
Logging Module (etl_logging)
from nrcan_etl_toolbox.etl_logging import CustomLogger
logger = CustomLogger(name="Test Logger", level='INFO'
,logger_type='verbose',
logger_file_name='test_logger.log')
# Logging messages
logger.info("Starting ETL process")
logger.debug("Technical details", extra={"data": {"items": 100}})
logger.error("Processing error", exc_info=True)
Data Readers (etl_toolbox)
from nrcan_etl_toolbox.etl_toolbox.reader import ReaderFactory
from nrcan_etl_toolbox.etl_toolbox.reader.source_readers import ExcelReader
# Creating a CSV reader
csv_reader = ReaderFactory(input_source="data.csv")
data = csv_reader.data
# Creating a Shapefile reader
shp_reader = ReaderFactory(input_source="data.shp")
geo_data = shp_reader.data
# Creating a PostGIS reader
postgis_reader = ReaderFactory(input_source="postgresql://user:password@host:port/database", # Use the connection string for your database
table_name="table_name",
schema="schema_name")
geo_data = postgis_reader.data
# Creating an Excel reader
reader = ReaderFactory(input_source="data.xlsx")
# Get the Reader object
excel_reader : ExcelReader = reader.reader
# If excel file contains multiple sheets,
# data will be a dictionary with sheet names as keys and dataframes as values
data = excel_reader.dataframe
# data = {'Sheet1': df1, 'Sheet2': df2}
# To read a specific sheet, use the sheet_name parameter
data = excel_reader.read_sheet('Sheet1')
# data = df1
Database Interface
# TODO: Complete documentation.
from nrcan_etl_toolbox.database.interface import AbstractDatabaseHandler
# Usage example to be documented
Development
To contribute to the project, install development dependencies:
poetry install --with dev
Run tests with:
pytest
Project Structure
nrcan_etl_toolbox/
├── database/ # Database interactions
│ ├── interface/ # Abstract interfaces for databases
│ └── orm/ # Object-relational mappings
├── etl_logging/ # ETL logging module
└── etl_toolbox/ # Main ETL tools
└── reader/ # Data source readers
└── source_readers/ # Specific reader implementations
Authors
- NRCAN (Natural Resources Canada)
- Xavier Malet
For questions or suggestions, please use the project's GitHub issues.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nrcan_etl_toolbox-0.1.55.tar.gz.
File metadata
- Download URL: nrcan_etl_toolbox-0.1.55.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09740f48a717385f53f23c6438913d5bd22270d0db102c7dd904eeb690f8d4ab
|
|
| MD5 |
7246f3a7e5178eac03dd79b1cd985283
|
|
| BLAKE2b-256 |
e08359f6b59a114a84d4830e4c8948fc237b6c2b90c1db0e522cdc47e76c584a
|
File details
Details for the file nrcan_etl_toolbox-0.1.55-py3-none-any.whl.
File metadata
- Download URL: nrcan_etl_toolbox-0.1.55-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea8d47a0a036b20d49702452aecf41eb16ab41de8c40b82b29d7c04ac6760a0f
|
|
| MD5 |
8b4786e2d2ae446849bde85352c6114c
|
|
| BLAKE2b-256 |
b4e8b724da56a9ef12ec80b70a20cb233a20be72a621340fce3d49520e98d178
|