A wrapper around Great Expectations for building validation components in Ascend.io platform
Project description
Ascend.io / Great Expectations docker image for Google Cloud storage
This image is a wrapper around official Ascend.io image to use Great Expectations validation tool.
Build the docker image
The image is built with Github action located in this file : .github/workflows/docker-build.yaml.
For now the image is pushed on docker hub at this address: fosk06/ascend-great-expectations-gcs:latest
The image is build on push on main branch and with git tag with the following form "v{X}.{Y}.{Z}"
With:
- X = Major version
- Y = Minor version
- Z = Correction version
Use it on ascend.io platform
This docker image is built for PySpark transforms on Ascend.io platform. First you need a Google cloud storage bucket named for example "great_expectations_store" and a service account with the role "storage.admin" on this bucket. Then upload this service account as a credentials on your Ascend.io instance and name it for example "great_expectations_sa".
Now you can create your PySpark transform on Ascend.io. In the advanced settings> Runtime settings > container image URL set the correct docker hub image url : fosk06/ascend-great-expectations-gcs:latest
Then in the "Custom Spark Params" click on "require credentials" and chose you credential previously uploaded "great_expectations_sa".
Write the PySpark transform
# import the custom package
from ascend_great_expectations_gcs.validator import Validator
# lets admit we are working on a "customer" table, write the expectations in specific function
def expectations(validator):
validator.expect_column_to_exist("customer_id")
validator.expect_column_values_to_not_be_null("customer_id")
validator.expect_column_to_exist('created_at')
# Ascend.io transform callback
def transform(spark_session: SparkSession, inputs: List[DataFrame], credentials=None):
df = inputs[0]
# instanciate the validator
validator = Validator(
name= NAME, # name of the validator
gcp_project=PROJECT, # your GCP project
bucket=BUCKET, # the name of your GCP bucket, for example "great_expectations_store"
credentials=credentials, # credentials from the transform callback
)
validator.add_expectations(expectations)
validator.run(df)
return df
test the class
create a virtual env then in ./venv/lib/python3.9/site-package write those two files
ascend_great_expectations_gcs_test.pth => set you package folder path ascend_great_expectations_gcs.pth => set you package folder path here
https://webdevdesigner.com/q/how-do-you-set-your-pythonpath-in-an-already-created-virtualenv-55773/
git tag -d v1.4.1 git push --delete origin v1.4.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ascend-great-expectations-gcs-0.2.5.tar.gz
.
File metadata
- Download URL: ascend-great-expectations-gcs-0.2.5.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11d1a7714268914a93493fc154a8d685ea40f3bc594cc755af8b6410b151a1c8 |
|
MD5 | 31d452b01b7a2b7e1d58f0bf2a503651 |
|
BLAKE2b-256 | c44c14c51434fdd1c734d961883cb187c90304fac9b4fb646c9c2904bf1699b4 |
File details
Details for the file ascend_great_expectations_gcs-0.2.5-py3-none-any.whl
.
File metadata
- Download URL: ascend_great_expectations_gcs-0.2.5-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec94df68344f648ae20609241ef025d286b03f48ad41a877bfa73acdb175b86a |
|
MD5 | fc28dc17bc647b7e6c4f3c56796d93d7 |
|
BLAKE2b-256 | 18f4fdc9a13463d0df2e76cfc474bd10ba24009328de063e473b30ef0fd2437f |