Skip to main content

Package for logging and database interfacing using SQLAlchemy and SQLModels

Project description

from nrcan_etl_toolbox.etl_toolbox.reader.source_readers import ExcelReader

NRCAN ETL Toolbox

codecov CI

Pour la version française de ce document, consultez README-fr.md.

etl-toolbox is a Python toolkit designed to simplify Extract, Transform, and Load (ETL) data processes. This modular toolkit offers several specialized components for different aspects of ETL workflows.

Components

etl_logging

Specialized logging module for ETL processes, allowing simple configuration and efficient log analysis.

etl_toolbox

Collection of tools for reading data from various sources. It includes readers for different file formats and databases, facilitating data integration in ETL processes:

  • Data Readers: CSV, Excel, GeoPackage, JSON, PostGIS, Shapefile

database

Interfaces and ORM for interacting with different database systems:

  • Database Interfaces: Abstract object handlers for database interactions
  • ORM: Object-relational mappings to simplify data access

Installation

Install the package via Poetry:

poetry install

Or by creating a distribution:

poetry build
pip install dist/nrcan_etl_toolbox-*.whl

Usage

Logging Module (etl_logging)

from nrcan_etl_toolbox.etl_logging import CustomLogger

logger = CustomLogger(name="Test Logger", level='INFO'
                      ,logger_type='verbose',
                      logger_file_name='test_logger.log')

# Logging messages
logger.info("Starting ETL process")
logger.debug("Technical details", extra={"data": {"items": 100}})
logger.error("Processing error", exc_info=True)

Data Readers (etl_toolbox)

from nrcan_etl_toolbox.etl_toolbox.reader import ReaderFactory
from nrcan_etl_toolbox.etl_toolbox.reader.source_readers import ExcelReader

# Creating a CSV reader
csv_reader = ReaderFactory(input_source="data.csv")
data = csv_reader.data

# Creating a Shapefile reader
shp_reader = ReaderFactory(input_source="data.shp")
geo_data = shp_reader.data

# Creating a PostGIS reader
postgis_reader = ReaderFactory(input_source="postgresql://user:password@host:port/database", # Use the connection string for your database
                               table_name="table_name",
                               schema="schema_name")
geo_data = postgis_reader.data

# Creating an Excel reader
reader = ReaderFactory(input_source="data.xlsx")
# Get the Reader object
excel_reader : ExcelReader = reader.reader
# If excel file contains multiple sheets, 
# data will be a dictionary with sheet names as keys and dataframes as values
data = excel_reader.dataframe 
# data = {'Sheet1': df1, 'Sheet2': df2}
# To read a specific sheet, use the sheet_name parameter
data = excel_reader.read_sheet('Sheet1')
# data = df1

Database Interface

# TODO: Complete documentation.
from nrcan_etl_toolbox.database.interface import AbstractDatabaseHandler
# Usage example to be documented

Development

To contribute to the project, install development dependencies:

poetry install --with dev

Run tests with:

pytest

Project Structure

nrcan_etl_toolbox/
├── database/               # Database interactions
│   ├── interface/          # Abstract interfaces for databases
│   └── orm/                # Object-relational mappings
├── etl_logging/            # ETL logging module
└── etl_toolbox/            # Main ETL tools
    └── reader/             # Data source readers
        └── source_readers/ # Specific reader implementations

Authors

For questions or suggestions, please use the project's GitHub issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nrcan_etl_toolbox-0.1.53.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nrcan_etl_toolbox-0.1.53-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file nrcan_etl_toolbox-0.1.53.tar.gz.

File metadata

  • Download URL: nrcan_etl_toolbox-0.1.53.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.11

File hashes

Hashes for nrcan_etl_toolbox-0.1.53.tar.gz
Algorithm Hash digest
SHA256 0fb4b7c6daa366a6e7f5e7ebad9bdbfff22479508b96701e3462624ebb6c2157
MD5 a6abe1a142d21ae3e3f22d3e23059231
BLAKE2b-256 fb6592e26a7453a0e821038d4adf52dc56f18d3de88de0a944016c3c109cf9ca

See more details on using hashes here.

File details

Details for the file nrcan_etl_toolbox-0.1.53-py3-none-any.whl.

File metadata

File hashes

Hashes for nrcan_etl_toolbox-0.1.53-py3-none-any.whl
Algorithm Hash digest
SHA256 a2ac46b2c0b92adf1db559edecf0b6899fc4393b40bce8312af8d745dfd1148c
MD5 a2988f46db018d65b450c49dee7062ca
BLAKE2b-256 4e9824f3674a9c8408f084b015537339ae72b53cca1d7e05c1577b561645afad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page