Skip to main content

A PySpark ETL Framework

Project description

PyPI Badge Build Status Code Coverage Documentation Status

Overview

PySetl is a framework focused to improve readability and structure of PySpark ETL projects. Also, it is designed to take advantage of Python’s typing syntax to reduce runtime errors through linting tools and verifying types at runtime. Thus, effectively enhacing stability for large ETL pipelines.

In order to accomplish this task we provide some tools:

  • pysetl.config: Type-safe configuration.

  • pysetl.storage: Agnostic and extensible data sources connections.

  • pysetl.workflow: Pipeline management and dependency injection.

PySetl is designed with Python typing syntax at its core. Hence, we strongly suggest typedspark and pydantic for development.

Why use PySetl?

  • Model complex data pipelines.

  • Reduce risks at production with type-safe development.

  • Improve large project structure and readability.

Installation

PySetl is available in PyPI:

pip install pysetl

PySetl doesn’t list pyspark as dependency since most environments have their own Spark environment. Nevertheless, you can install pyspark running:

pip install "pysetl[pyspark]"

Acknowledgments

PySetl is a port from SETL. We want to fully recognise this package is heavily inspired by the work of the SETL team. We just adapted things to work in Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysetl-0.1.7rc0.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

pysetl-0.1.7rc0-py3-none-any.whl (51.1 kB view details)

Uploaded Python 3

File details

Details for the file pysetl-0.1.7rc0.tar.gz.

File metadata

  • Download URL: pysetl-0.1.7rc0.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.12.0 Darwin/23.0.0

File hashes

Hashes for pysetl-0.1.7rc0.tar.gz
Algorithm Hash digest
SHA256 3c9e838a201e150d902e8494ad1f2fa5ed0d800c073130c3d354bc0f14a43e72
MD5 0cd9a7bb7767ddead7fdea49ef6193a7
BLAKE2b-256 970a78c8ba2027042c39715017b4871c47f833d1df870408933e1d76d1a33dfc

See more details on using hashes here.

File details

Details for the file pysetl-0.1.7rc0-py3-none-any.whl.

File metadata

  • Download URL: pysetl-0.1.7rc0-py3-none-any.whl
  • Upload date:
  • Size: 51.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.12.0 Darwin/23.0.0

File hashes

Hashes for pysetl-0.1.7rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0e2f1d64ba3cf79ae3a87208cfe8be9c569a2bfebf8c0ac988c8be1d5782949
MD5 d95799c82dfa13fa05c93a5c24cd5f1e
BLAKE2b-256 04e38455af95c37e469ddfeecc9e2298a4ab4f660627d45a5ada7bc8cf0f9fc1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page