Pramen transformations written in python
Project description
Pramen-py
Cli application for defining the data transformations for Pramen.
See:
pramen-py --help
for more information.
Installation
App settings
Application configuration solved by the environment variables (see .env.example)
Add pramen-py as a dependency to your project
In case of poetry:
# ensure we have valid poetry environment
ls pyproject.toml || poetry init
poetry add pramen-py
In case of pip:
pip install pramen-py
Usage
Application configuration
In order to configure the pramen-py options you need to set corresponding environment variables. To see the list of available options run:
pramen-py list-configuration-options
Developing transformations
pramen-py uses python's namespace packages for discovery of the transformations.
This mean, that in order to build a new transformer, it should be located
inside a python package with the transformations
directory inside.
This directory should be declared as a package:
- for poetry
[tool.poetry]
# ...
packages = [
{ include = "transformations" },
]
- for setup.py
from setuptools import setup, find_namespace_packages
setup(
name='mynamespace-subpackage-a',
# ...
packages=find_namespace_packages(include=['transformations.*'])
)
Example files structure:
❯ tree .
.
├── README.md
├── poetry.lock
├── pyproject.toml
├── tests
│ └── test_identity_transformer.py
└── transformations
└── identity_transformer
├── __init__.py
└── example_config.yaml
In order to make transformer picked up by the pramen-py the following conditions should be satisfied:
- python package containing the transformers should be installed to the same python environment as pramen-py
- python package should have defined namespace package
transformations
- transformers should extend
pramen_py.Transformation
base class
Subclasses created by extending Transformation base class are registered as a cli command (pramen-py transformations run TransformationSubclassName) with default options. Check:
pramen-py transformations run ExampleTransformation1 --help
for more details.
You can add your own cli options to your transformations. See example at ExampleTransformation2
pramen-py pytest plugin
pramen-py also provides pytest plugin with helpful fixtures to test created transformers.
List of available fixtures:
#install pramen-py into the environment and activate it
pytest --fixtures
# check under --- fixtures defined from pramen_py.test_utils.fixtures ---
pramen-py pytest plugin also loads environment variables from .env file if it is presented in the root of the repo.
Running and configuring transformations
Transformations can be run with the following command:
pramen-py transformations run \
ExampleTransformation1 \
--config config.yml \
--info-date 2022-04-01
--config
is required option for any transformation. See
config_example.yaml for more information.
To check available options and documentation for a particular transformation, run:
pramen-py transformations run TransformationClassName --help
where TransformationClassName is the name of the transformation.
Using as a Library
Read metastore tables by Pramen-Py API
import datetime
from pyspark.sql import SparkSession
from pramen_py import MetastoreReader
from pramen_py.utils.file_system import FileSystemUtils
spark = SparkSession.getOrCreate()
hocon_config = FileSystemUtils(spark) \
.load_hocon_config_from_hadoop("uri_or_path_to_file")
metastore = MetastoreReader(spark) \
.from_config(hocon_config)
df_txn = metastore.get_table(
"transactions",
info_date_from=datetime.date(2022, 1, 1),
info_date_to=datetime.date(2022, 6, 1)
)
df_customer = metastore.get_latest("customer")
df_txn.show(truncate=False)
df_customer.show(truncate=False)
Development
Prerequisites:
- https://python-poetry.org/docs/#installation
- python 3.6
Setup steps:
git clone https://github.com/AbsaOSS/pramen
cd pramen-py
make install # create virtualenv and install dependencies
make test
make pre-commit
# enable completions
# source <(pramen-py completions zsh)
# source <(pramen-py completions bash)
pramen-py --help
Load environment configuration
Before doing any development step, you have to set your development environment variables
make install
Completions
# enable completions
source <(pramen-py completions zsh)
# or for bash
# source <(pramen-py completions bash)
Deployment
From the local development environment
# bump the version
vim pyproject.toml
# deploy to the dev environment (included steps of building and publishing
# artefacts)
cat .env.ci
make publish
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pramen_py-1.10.0.tar.gz
.
File metadata
- Download URL: pramen_py-1.10.0.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.15 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4243e44c26bb31cd7f52935c2e36bf9ed6539fa51beec6d141527bd87627f74a |
|
MD5 | 713f6b2be320ea1d1ff8c95b9495219d |
|
BLAKE2b-256 | ef6f10bd81d8ee0d68c6e105aa2a2aa830b710bb17e291db4472492d4246586f |
File details
Details for the file pramen_py-1.10.0-py3-none-any.whl
.
File metadata
- Download URL: pramen_py-1.10.0-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.15 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14a9e1fea237f6b64cb59780ef8da9a67771da4736b396fbf3d780577a04dbb6 |
|
MD5 | a9def4c9cb66fb0be6932e7f32149aba |
|
BLAKE2b-256 | 05646fb02ed2ef9f1112a738ef28626f70af625a97164bc456c611d3cbb9368c |