A wrapper around Great Expectations for building validation components in Ascend.io platform
Project description
Ascend.io / Great Expectations docker image for Google Cloud storage
This image is a wrapper around official Ascend.io image to use Great Expectations validation tool.
Build the docker image
The image is built with Github action located in this file : .github/workflows/docker-build.yaml.
For now the image is pushed on docker hub at this address: fosk06/ascend-great-expectations-gcs:latest
The image is build on push on main branch and with git tag with the following form "v{X}.{Y}.{Z}"
With:
- X = Major version
- Y = Minor version
- Z = Correction version
Use it on ascend.io platform
This docker image is built for PySpark transforms on Ascend.io platform. First you need a Google cloud storage bucket named for example "great_expectations_store" and a service account with the role "storage.admin" on this bucket. Then upload this service account as a credentials on your Ascend.io instance and name it for example "great_expectations_sa".
Now you can create your PySpark transform on Ascend.io. In the advanced settings> Runtime settings > container image URL set the correct docker hub image url : fosk06/ascend-great-expectations-gcs:latest
Then in the "Custom Spark Params" click on "require credentials" and chose you credential previously uploaded "great_expectations_sa".
Write the PySpark transform
# import the custom package
from ascend_great_expectations_gcs.validator import Validator
# lets admit we are working on a "customer" table, write the expectations in specific function
def expectations(validator):
validator.expect_column_to_exist("customer_id")
validator.expect_column_values_to_not_be_null("customer_id")
validator.expect_column_to_exist('created_at')
# Ascend.io transform callback
def transform(spark_session: SparkSession, inputs: List[DataFrame], credentials=None):
df = inputs[0]
# instanciate the validator
validator = Validator(
name= NAME, # name of the validator
gcp_project=PROJECT, # your GCP project
bucket=BUCKET, # the name of your GCP bucket, for example "great_expectations_store"
credentials=credentials, # credentials from the transform callback
)
validator.add_expectations(expectations)
validator.run(df)
return df
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ascend-great-expectations-gcs-0.1.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 998c0177e1220d5b6506e4a37e32f2d9658dc1bcd8e512b90e79fe052f182c41 |
|
MD5 | 44dc40340e561c306206c3c34de2d2e4 |
|
BLAKE2b-256 | 0023a3d61e2263d5cf8079f984227343a46e4f0bd2a14ce4f7f02ad6da5d8def |
Hashes for ascend_great_expectations_gcs-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d5aeb0e128b3d583d178a92b406b8c772525b6255776c7a87f3e9bbe3312499 |
|
MD5 | 7fffb392c5f82ade524a5627ee977163 |
|
BLAKE2b-256 | 90b4f85ab0d3a41c59ed20271e1dcac0c16d6f316699000ba56287fe6874c840 |