Skip to main content

Pramen transformations written in python

Project description

Pramen-py

Cli application for defining the data transformations for Pramen.

See:

pramen-py --help

for more information.

Installation

App settings

Application configuration solved by the environment variables (see .env.example)

Add pramen-py as a dependency to your project

In case of poetry:

# ensure we have valid poetry environment
ls pyproject.toml || poetry init

poetry add pramen-py

In case of pip:

pip install pramen-py

Usage

Application configuration

In order to configure the pramen-py options you need to set corresponding environment variables. To see the list of available options run:

pramen-py list-configuration-options

Developing transformations

pramen-py uses python's namespace packages for discovery of the transformations.

This mean, that in order to build a new transformer, it should be located inside a python package with the transformations directory inside.

This directory should be declared as a package:

  • for poetry
[tool.poetry]
# ...
packages = [
    { include = "transformations" },
]
  • for setup.py
from setuptools import setup, find_namespace_packages

setup(
    name='mynamespace-subpackage-a',
    # ...
    packages=find_namespace_packages(include=['transformations.*'])
)

Example files structure:

❯ tree .
.
├── README.md
├── poetry.lock
├── pyproject.toml
├── tests
│  └── test_identity_transformer.py
└── transformations
    └── identity_transformer
        ├── __init__.py
        └── example_config.yaml

In order to make transformer picked up by the pramen-py the following conditions should be satisfied:

  • python package containing the transformers should be installed to the same python environment as pramen-py
  • python package should have defined namespace package transformations
  • transformers should extend pramen_py.Transformation base class

Subclasses created by extending Transformation base class are registered as a cli command (pramen-py transformations run TransformationSubclassName) with default options. Check:

pramen-py transformations run ExampleTransformation1 --help

for more details.

You can add your own cli options to your transformations. See example at ExampleTransformation2

pramen-py pytest plugin

pramen-py also provides pytest plugin with helpful fixtures to test created transformers.

List of available fixtures:

#install pramen-py into the environment and activate it
pytest --fixtures
# check under --- fixtures defined from pramen_py.test_utils.fixtures ---

pramen-py pytest plugin also loads environment variables from .env file if it is presented in the root of the repo.

Running and configuring transformations

Transformations can be run with the following command:

pramen-py transformations run \
  ExampleTransformation1 \
  --config config.yml \
  --info-date 2022-04-01

--config is required option for any transformation. See config_example.yaml for more information.

To check available options and documentation for a particular transformation, run:

pramen-py transformations run TransformationClassName --help

where TransformationClassName is the name of the transformation.

Using as a Library

Read metastore tables by Pramen-Py API

import datetime
from pyspark.sql import SparkSession
from pramen_py import MetastoreReader
from pramen_py.utils.file_system import FileSystemUtils

spark = SparkSession.getOrCreate()

hocon_config = FileSystemUtils(spark) \
    .load_hocon_config_from_hadoop("uri_or_path_to_file")

metastore = MetastoreReader(spark) \
    .from_config(hocon_config)

df_txn = metastore.get_table(
    "transactions",
    info_date_from=datetime.date(2022, 1, 1),
    info_date_to=datetime.date(2022, 6, 1)
)

df_customer = metastore.get_latest("customer")

df_txn.show(truncate=False)
df_customer.show(truncate=False)

Development

Prerequisites:

Setup steps:

git clone https://github.com/AbsaOSS/pramen
cd pramen-py
make install  # create virtualenv and install dependencies
make test
make pre-commit

# enable completions
# source <(pramen-py completions zsh)
# source <(pramen-py completions bash)

pramen-py --help

Load environment configuration

Before doing any development step, you have to set your development environment variables

make install

Completions

# enable completions
source <(pramen-py completions zsh)
# or for bash
# source <(pramen-py completions bash)

Deployment

From the local development environment

# bump the version
vim pyproject.toml

# deploy to the dev environment (included steps of building and publishing
#   artefacts)
cat .env.ci
make publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pramen_py-1.8.9.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pramen_py-1.8.9-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file pramen_py-1.8.9.tar.gz.

File metadata

  • Download URL: pramen_py-1.8.9.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.14 Linux/6.5.0-1021-azure

File hashes

Hashes for pramen_py-1.8.9.tar.gz
Algorithm Hash digest
SHA256 683f5f163e40af5dae64940e1a67b0553816c41dcc581322259a793d58557ff2
MD5 9cc0955cd1d2f46431742e0481bd3273
BLAKE2b-256 6e151e674febc6ed0261832ee29f5f93dc89b9c6f85c228f5b8862ea8c961821

See more details on using hashes here.

File details

Details for the file pramen_py-1.8.9-py3-none-any.whl.

File metadata

  • Download URL: pramen_py-1.8.9-py3-none-any.whl
  • Upload date:
  • Size: 45.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.14 Linux/6.5.0-1021-azure

File hashes

Hashes for pramen_py-1.8.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b4e2d65a071588325a73b60e301cb87ca0ac21a4e1b8bfe1b7875da0a4989336
MD5 47e572e1a459195967724b78a4c040b7
BLAKE2b-256 27f219fec8a96eba854617a9dad8215c14220f358fe6e65f8b68b0d32b475e3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page