Skip to main content

A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.

Project description

spetlr

A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.

Visit SPETLR official webpage: https://spetlr.com/

NEWS

  • Cluster test submission with spetlr-tools
  • Upgrade to Python 3.12
  • The spetlr library (probably except SQL connection with ODBC) still supports older LTS versions between 9.1 and 13.3, but only 16.4 is tested.
  • SQL ODBC driver version 18 is supported (this is a breaking change if you haven't upgraded your ODBC driver).
  • Newest CosmosDB connector is tested for compatibility with DBR LTS 16.4.

Table of Contents

Description

SPETLR has a lot of great tools for working with ETL in Databricks. But to make it easy for you to consider why you need SPETLR here is a list of the core features:

  • ETL framework: A common ETL framework that enables reusable transformations in an object-oriented manner. Standardized structures facilitate cooperation in large teams.

  • Integration testing: A framework for creating test databases and tables before deploying to production in order to ensure reliable and stable data platforms. An additional layer of data abstraction allows full integration testing.

  • Handlers: Standard connectors with commonly used options reduce boilerplate.

For more information, visit SPETLR official webpage: https://spetlr.com/

Important Notes

This package can not be run or tested without access to pyspark. However, installing pyspark as part of our installer gave issues when other versions of pyspark were needed. Hence we took out the dependency from our installer.

Installation

Install SPETLR from PyPI: PyPI version PyPI

pip install spetlr

Development Notes

To prepare for development, please install these additional requirements:

  • Java 21
  • pip install -r requirements_test.txt

Then install the package locally

python setup.py develop

Testing

Local tests

After installing the dev-requirements, execute tests by running:

pytest tests

These tests are located in the ./tests/local folder and only require a Python interpreter. Pull requests will not be accepted if these tests do not pass. If you add new features, please include corresponding tests.

Cluster tests

Tests in the ./tests/cluster folder are designed to run on a Databricks cluster. The Pre-integration Test utilizes Azure Resource deployment - and can only be run by the spetlr-org admins.

To deploy the necessary Azure resources to your own Azure Tenant, run the following command:

.\.github\deploy\deploy.ps1 -uniqueRunId "yourUniqueId"

Be aware that the applied name for uniqueRunId should only contain lower case and numbers, and its length should not exceed 12 characters.

Afterward, execute the following commands:

.\.github\submit\build.ps1
.\.github\submit\submit_test_job.ps1

General Project Info

Github top language Github stars Github forks Github size Issues Open PyPI spetlr badge

Contributing

Feel free to contribute to SPETLR. Any contributions are appreciated - not only new features, but also if you find a way to improve SPETLR.

If you have a suggestion that can enhance SPETLR, please fork the repository and create a pull request. Alternatively, you can open an issue with the "enhancement" tag.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/NewSPETLRFeature)
  3. Commit your Changes (git commit -m 'Add some SEPTLRFeature')
  4. Push to the Branch (git push origin feature/NewSPETLRFeature)
  5. Open a Pull Request

Build Status

Post-Integration

Releases

Releases to PyPI is an Github Action which needs to be manually triggered.

Release PyPI spetlr badge

Requirements and dependencies

The library has three txt-files at the root of the repo. These files defines three levels of requirements:

  • requirements_install.txt - this file contains the required libraries to be able to install spetlr.
  • requirements_test.txt - libraries required to run unit- and integration tests
  • requirements_dev.txt - libraries required in the development process in order to contribute to the repo

All libraries and their dependencies are added with a fixed version to the configuration file setup.cfg using the defined requirements from requirements_install.txt.

To upgrade the the dependencies in the setup.cfg file do the following:

  1. Create a new branch
  2. Run upgrade_requirements.ps1 in your terminal
  3. Commit the changes the script has made to the cfg file. If there are no changes, everything is up to date.
  4. The PR runs all tests and ensure that the library is compliant with any updates

Note that if it is desired to upgrade a dependency, but not to its newest version, it is possible to set the desired version in the requirements_install.txt, then this will be respected by the upgrade script.

Contact

For any inquiries, please use the SPETLR Discord Server.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spetlr-16.4.9.tar.gz (205.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spetlr-16.4.9-py3-none-any.whl (173.6 kB view details)

Uploaded Python 3

File details

Details for the file spetlr-16.4.9.tar.gz.

File metadata

  • Download URL: spetlr-16.4.9.tar.gz
  • Upload date:
  • Size: 205.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spetlr-16.4.9.tar.gz
Algorithm Hash digest
SHA256 6c1636617753991fc6dda9cbdaab5d7d37ceb6f490a28e780963e9ad84dc588d
MD5 b2655a3ce5e8611cf88c71038906e55f
BLAKE2b-256 66f4f8d6c6f94ab7616eb8c96c492d79f9f9aff74f552e7c4f38b68e88aa3477

See more details on using hashes here.

Provenance

The following attestation bundles were made for spetlr-16.4.9.tar.gz:

Publisher: release.yml on spetlr-org/spetlr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spetlr-16.4.9-py3-none-any.whl.

File metadata

  • Download URL: spetlr-16.4.9-py3-none-any.whl
  • Upload date:
  • Size: 173.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spetlr-16.4.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ca9a3e9c98fa7ffad84892a3e4d8132122ee6c04e5f0694c93e0802f77d9d834
MD5 9bc0e1fdf348ebfba383d793109cc36a
BLAKE2b-256 728647c635b13e9ed86f6b9f7789383f27610c5679bb15e17c89ffe866d94a28

See more details on using hashes here.

Provenance

The following attestation bundles were made for spetlr-16.4.9-py3-none-any.whl:

Publisher: release.yml on spetlr-org/spetlr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page