Skip to main content

This project/library contains common elements related to ETL processes...

Project description

# core-etl

This library provides essential components for ETL processes, offering reusable interfaces for seamless data extraction, transformation, and loading….


Python Versions License Pipeline Status Docs Status Security

Installation

Install from PyPI using pip:

pip install core-etl
uv pip install core-etl  # Or using UV...

Features

Base ETL Framework

  • Template method pattern for ETL workflow orchestration

  • Comprehensive lifecycle hooks (pre-processing, execution, post-processing, cleanup)

  • Built-in error handling with detailed exception logging

  • Task status tracking (CREATED, EXECUTING, SUCCESS, ERROR)

  • Timezone support for date/datetime processing (defaults to UTC)

  • Temporary folder management for local file operations

  • Extensible resource cleanup mechanisms

File-Based ETL (IBaseEtlFromFile)

  • Process files from various sources (SFTP, local filesystem, cloud storage)

  • Iterator-based file processing with error isolation per file

  • Individual file success/error callbacks for custom handling

  • Batch file operations with automatic error recovery

  • Extensible hooks: get_paths(), process_file(), on_success(), on_error()

Record-Based ETL (IBaseEtlFromRecord)

  • Process records from APIs, databases, files, message queues, and data streams

  • Memory-efficient batch processing with configurable batch sizes

  • Built-in transformation pipeline:

    • Field removal (attrs_to_remove)

    • Field renaming (name_mapper)

    • Data type casting (type_mapper)

  • Pre and post transformation hooks for custom business logic

  • Incremental processing support with last_processed markers

  • Extensible methods: retrieve_records(), process_records(), pre_transformations(), post_transformations()

Async ETL (IAsyncETL)

  • Concurrent record processing via asyncio producer/consumer pattern

  • Configurable worker pool size (max_workers) and queue capacity (max_queue_size)

  • Individual record failures are isolated, failed records are logged and skipped without aborting the pipeline

  • Extensible methods: produce_records(), _process_record()

  • Note: execute() uses asyncio.run() internally; call await asyncio.to_thread(task.execute) from async contexts

Quick Start

Installation

Install the package:

pip install core-etl
uv pip install core-etl     # Or using UV...
pip install -e ".[dev]"     # For development...

Setting Up Environment

  1. Install required libraries:

pip install --upgrade pip
pip install virtualenv
  1. Create Python virtual environment:

virtualenv --python=python3.12 .venv
  1. Activate the virtual environment:

source .venv/bin/activate

Install packages

pip install .
pip install -e ".[dev]"

Check tests and coverage

python manager.py run-tests
python manager.py run-coverage

Contributing

Contributions are welcome! Please:

  1. Fork the repository

  2. Create a feature branch

  3. Write tests for new functionality

  4. Ensure all tests pass: pytest -n auto

  5. Run linting: pylint core_etl

  6. Run security checks: bandit -r core_etl

  7. Submit a pull request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For questions or support, please open an issue on GitLab or contact the maintainers.

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

core_etl-3.2.1.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

core_etl-3.2.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file core_etl-3.2.1.tar.gz.

File metadata

  • Download URL: core_etl-3.2.1.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for core_etl-3.2.1.tar.gz
Algorithm Hash digest
SHA256 a73920148ea9ceb1131af56ce89644731a3fd56298d2348c0678d370da93efbd
MD5 8f545e8ee87779078b3d7bf3b0e91cc4
BLAKE2b-256 295077d2fc4f830d88b44ee6cfc99260e562b9f691c1337895d2530467a7909d

See more details on using hashes here.

File details

Details for the file core_etl-3.2.1-py3-none-any.whl.

File metadata

  • Download URL: core_etl-3.2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for core_etl-3.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b39954d98c6652eb8e1d72b25c9876c561719f98668f69beec76a346f57b437b
MD5 d3d9a2539b63a6e6d7aa58fc5b765139
BLAKE2b-256 f4d043ffa4d4097fd46c9752660613f34718606ac05dd2503c7548248a7313d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page