Skip to main content

Pramen transformations written in python

Project description

Pramen-py

Cli application for defining the data transformations for Pramen.

See:

pramen-py --help

for more information.

Installation

App settings

Application configuration solved by the environment variables (see .env.example)

Add pramen-py as a dependency to your project

In case of poetry:

# ensure we have valid poetry environment
ls pyproject.toml || poetry init

poetry add pramen-py

In case of pip:

pip install pramen-py

Usage

Application configuration

In order to configure the pramen-py options you need to set corresponding environment variables. To see the list of available options run:

pramen-py list-configuration-options

Developing transformations

pramen-py uses python's namespace packages for discovery of the transformations.

This mean, that in order to build a new transformer, it should be located inside a python package with the transformations directory inside.

This directory should be declared as a package:

  • for poetry
[tool.poetry]
# ...
packages = [
    { include = "transformations" },
]
  • for setup.py
from setuptools import setup, find_namespace_packages

setup(
    name='mynamespace-subpackage-a',
    # ...
    packages=find_namespace_packages(include=['transformations.*'])
)

Example files structure:

❯ tree .
.
├── README.md
├── poetry.lock
├── pyproject.toml
├── tests
│  └── test_identity_transformer.py
└── transformations
    └── identity_transformer
        ├── __init__.py
        └── example_config.yaml

In order to make transformer picked up by the pramen-py the following conditions should be satisfied:

  • python package containing the transformers should be installed to the same python environment as pramen-py
  • python package should have defined namespace package transformations
  • transformers should extend pramen_py.Transformation base class

Subclasses created by extending Transformation base class are registered as a cli command (pramen-py transformations run TransformationSubclassName) with default options. Check:

pramen-py transformations run ExampleTransformation1 --help

for more details.

You can add your own cli options to your transformations. See example at ExampleTransformation2

pramen-py pytest plugin

pramen-py also provides pytest plugin with helpful fixtures to test created transformers.

List of available fixtures:

#install pramen-py into the environment and activate it
pytest --fixtures
# check under --- fixtures defined from pramen_py.test_utils.fixtures ---

pramen-py pytest plugin also loads environment variables from .env file if it is presented in the root of the repo.

Running and configuring transformations

Transformations can be run with the following command:

pramen-py transformations run \
  ExampleTransformation1 \
  --config config.yml \
  --info-date 2022-04-01

--config is required option for any transformation. See config_example.yaml for more information.

To check available options and documentation for a particular transformation, run:

pramen-py transformations run TransformationClassName --help

where TransformationClassName is the name of the transformation.

Using as a Library

Read metastore tables by Pramen-Py API

import datetime
from pyspark.sql import SparkSession
from pramen_py import MetastoreReader
from pramen_py.utils.file_system import FileSystemUtils

spark = SparkSession.getOrCreate()

hocon_config = FileSystemUtils(spark) \
    .load_hocon_config_from_hadoop("uri_or_path_to_file")

metastore = MetastoreReader(spark) \
    .from_config(hocon_config)

df_txn = metastore.get_table(
    "transactions",
    info_date_from=datetime.date(2022, 1, 1),
    info_date_to=datetime.date(2022, 6, 1)
)

df_customer = metastore.get_latest("customer")

df_txn.show(truncate=False)
df_customer.show(truncate=False)

Development

Prerequisites:

Setup steps:

git clone https://github.com/AbsaOSS/pramen
cd pramen-py
make install  # create virtualenv and install dependencies
make test
make pre-commit

# enable completions
# source <(pramen-py completions zsh)
# source <(pramen-py completions bash)

pramen-py --help

Load environment configuration

Before doing any development step, you have to set your development environment variables

make install

Completions

# enable completions
source <(pramen-py completions zsh)
# or for bash
# source <(pramen-py completions bash)

Deployment

From the local development environment

# bump the version
vim pyproject.toml

# deploy to the dev environment (included steps of building and publishing
#   artefacts)
cat .env.ci
make publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pramen_py-1.10.1.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

pramen_py-1.10.1-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file pramen_py-1.10.1.tar.gz.

File metadata

  • Download URL: pramen_py-1.10.1.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.15 Linux/6.5.0-1025-azure

File hashes

Hashes for pramen_py-1.10.1.tar.gz
Algorithm Hash digest
SHA256 61ef89eec5c6b0363f0d180f1341ee4fa848cdb69ee4a99be613183468c98301
MD5 f69cdc5c64edd911c27b07507b04e398
BLAKE2b-256 7ec9f92b6fcfadf22a29008660657513ead5c921cce0867902402ed04ea45769

See more details on using hashes here.

File details

Details for the file pramen_py-1.10.1-py3-none-any.whl.

File metadata

  • Download URL: pramen_py-1.10.1-py3-none-any.whl
  • Upload date:
  • Size: 45.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.15 Linux/6.5.0-1025-azure

File hashes

Hashes for pramen_py-1.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 208e5bffc9f5dc936fb42b1b05a8f50e598f6c4403647f6fa440b0cdf8a7cdc6
MD5 e17890a9c0c4cb39eb6a43c063824634
BLAKE2b-256 106d85187aeb324c4fe5f95406a09d1dd130bcb0020e5ce17a61421c5d43ce80

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page