Skip to main content

Pramen transformations written in python

Project description

Pramen-py

Cli application for defining the data transformations for Pramen.

See:

pramen-py --help

for more information.

Installation

App settings

Application configuration solved by the environment variables (see .env.example)

Add pramen-py as a dependency to your project

In case of poetry:

# ensure we have valid poetry environment
ls pyproject.toml || poetry init

poetry add pramen-py

In case of pip:

pip install pramen-py

Usage

Application configuration

In order to configure the pramen-py options you need to set corresponding environment variables. To see the list of available options run:

pramen-py list-configuration-options

Developing transformations

pramen-py uses python's namespace packages for discovery of the transformations.

This mean, that in order to build a new transformer, it should be located inside a python package with the transformations directory inside.

This directory should be declared as a package:

  • for poetry
[tool.poetry]
# ...
packages = [
    { include = "transformations" },
]
  • for setup.py
from setuptools import setup, find_namespace_packages

setup(
    name='mynamespace-subpackage-a',
    # ...
    packages=find_namespace_packages(include=['transformations.*'])
)

Example files structure:

❯ tree .
.
├── README.md
├── poetry.lock
├── pyproject.toml
├── tests
│  └── test_identity_transformer.py
└── transformations
    └── identity_transformer
        ├── __init__.py
        └── example_config.yaml

In order to make transformer picked up by the pramen-py the following conditions should be satisfied:

  • python package containing the transformers should be installed to the same python environment as pramen-py
  • python package should have defined namespace package transformations
  • transformers should extend pramen_py.Transformation base class

Subclasses created by extending Transformation base class are registered as a cli command (pramen-py transformations run TransformationSubclassName) with default options. Check:

pramen-py transformations run ExampleTransformation1 --help

for more details.

You can add your own cli options to your transformations. See example at ExampleTransformation2

pramen-py pytest plugin

pramen-py also provides pytest plugin with helpful fixtures to test created transformers.

List of available fixtures:

#install pramen-py into the environment and activate it
pytest --fixtures
# check under --- fixtures defined from pramen_py.test_utils.fixtures ---

pramen-py pytest plugin also loads environment variables from .env file if it is presented in the root of the repo.

Running and configuring transformations

Transformations can be run with the following command:

pramen-py transformations run \
  ExampleTransformation1 \
  --config config.yml \
  --info-date 2022-04-01

--config is required option for any transformation. See config_example.yaml for more information.

To check available options and documentation for a particular transformation, run:

pramen-py transformations run TransformationClassName --help

where TransformationClassName is the name of the transformation.

Using as a Library

Read metastore tables by Pramen-Py API

import datetime
from pyspark.sql import SparkSession
from pramen_py import MetastoreReader
from pramen_py.utils.file_system import FileSystemUtils

spark = SparkSession.getOrCreate()

hocon_config = FileSystemUtils(spark) \
    .load_hocon_config_from_hadoop("uri_or_path_to_file")

metastore = MetastoreReader(spark) \
    .from_config(hocon_config)

df_txn = metastore.get_table(
    "transactions",
    info_date_from=datetime.date(2022, 1, 1),
    info_date_to=datetime.date(2022, 6, 1)
)

df_customer = metastore.get_latest("customer")

df_txn.show(truncate=False)
df_customer.show(truncate=False)

Development

Prerequisites:

Setup steps:

git clone https://github.com/AbsaOSS/pramen
cd pramen-py
make install  # create virtualenv and install dependencies
make test
make pre-commit

# enable completions
# source <(pramen-py completions zsh)
# source <(pramen-py completions bash)

pramen-py --help

Load environment configuration

Before doing any development step, you have to set your development environment variables

make install

Completions

# enable completions
source <(pramen-py completions zsh)
# or for bash
# source <(pramen-py completions bash)

Deployment

From the local development environment

# bump the version
vim pyproject.toml

# deploy to the dev environment (included steps of building and publishing
#   artefacts)
cat .env.ci
make publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pramen_py-1.10.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

pramen_py-1.10.0-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file pramen_py-1.10.0.tar.gz.

File metadata

  • Download URL: pramen_py-1.10.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.15 Linux/6.5.0-1025-azure

File hashes

Hashes for pramen_py-1.10.0.tar.gz
Algorithm Hash digest
SHA256 4243e44c26bb31cd7f52935c2e36bf9ed6539fa51beec6d141527bd87627f74a
MD5 713f6b2be320ea1d1ff8c95b9495219d
BLAKE2b-256 ef6f10bd81d8ee0d68c6e105aa2a2aa830b710bb17e291db4472492d4246586f

See more details on using hashes here.

File details

Details for the file pramen_py-1.10.0-py3-none-any.whl.

File metadata

  • Download URL: pramen_py-1.10.0-py3-none-any.whl
  • Upload date:
  • Size: 45.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.15 Linux/6.5.0-1025-azure

File hashes

Hashes for pramen_py-1.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14a9e1fea237f6b64cb59780ef8da9a67771da4736b396fbf3d780577a04dbb6
MD5 a9def4c9cb66fb0be6932e7f32149aba
BLAKE2b-256 05646fb02ed2ef9f1112a738ef28626f70af625a97164bc456c611d3cbb9368c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page