Skip to main content

A wrapper around Great Expectations for building validation components in Ascend.io platform

Project description

Ascend.io / Great Expectations docker image for Google Cloud storage

This image is a wrapper around official Ascend.io image to use Great Expectations validation tool.

Build the docker image

The image is built with Github action located in this file : .github/workflows/docker-build.yaml.

For now the image is pushed on docker hub at this address: fosk06/ascend-great-expectations-gcs:latest

The image is build on push on main branch and with git tag with the following form "v{X}.{Y}.{Z}"

With:

  • X = Major version
  • Y = Minor version
  • Z = Correction version

Use it on ascend.io platform

This docker image is built for PySpark transforms on Ascend.io platform. First you need a Google cloud storage bucket named for example "great_expectations_store" and a service account with the role "storage.admin" on this bucket. Then upload this service account as a credentials on your Ascend.io instance and name it for example "great_expectations_sa".

Now you can create your PySpark transform on Ascend.io. In the advanced settings> Runtime settings > container image URL set the correct docker hub image url : fosk06/ascend-great-expectations-gcs:latest

Then in the "Custom Spark Params" click on "require credentials" and chose you credential previously uploaded "great_expectations_sa".

Write the PySpark transform

# import the custom package
from ascend_great_expectations_gcs.validator import Validator

# lets admit we are working on a "customer" table, write the expectations in specific function
def expectations(validator):
  validator.expect_column_to_exist("customer_id")
  validator.expect_column_values_to_not_be_null("customer_id")
  validator.expect_column_to_exist('created_at')

# Ascend.io transform callback
def transform(spark_session: SparkSession, inputs: List[DataFrame], credentials=None):
    df = inputs[0]
    # instanciate the validator
    validator = Validator(
        name= NAME, # name of the validator
        gcp_project=PROJECT, # your GCP project
        bucket=BUCKET, # the name of your GCP bucket, for example "great_expectations_store"
        credentials=credentials, # credentials from the transform callback
    )
    validator.add_expectations(expectations)
    validator.run(df)
    return df

test the class

create a virtual env then in ./venv/lib/python3.9/site-package write those two files

ascend_great_expectations_gcs_test.pth => set you package folder path ascend_great_expectations_gcs.pth => set you package folder path here

https://webdevdesigner.com/q/how-do-you-set-your-pythonpath-in-an-already-created-virtualenv-55773/

git tag -d v1.4.1 git push --delete origin v1.4.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ascend-great-expectations-gcs-0.2.5.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file ascend-great-expectations-gcs-0.2.5.tar.gz.

File metadata

  • Download URL: ascend-great-expectations-gcs-0.2.5.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for ascend-great-expectations-gcs-0.2.5.tar.gz
Algorithm Hash digest
SHA256 11d1a7714268914a93493fc154a8d685ea40f3bc594cc755af8b6410b151a1c8
MD5 31d452b01b7a2b7e1d58f0bf2a503651
BLAKE2b-256 c44c14c51434fdd1c734d961883cb187c90304fac9b4fb646c9c2904bf1699b4

See more details on using hashes here.

File details

Details for the file ascend_great_expectations_gcs-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: ascend_great_expectations_gcs-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for ascend_great_expectations_gcs-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ec94df68344f648ae20609241ef025d286b03f48ad41a877bfa73acdb175b86a
MD5 fc28dc17bc647b7e6c4f3c56796d93d7
BLAKE2b-256 18f4fdc9a13463d0df2e76cfc474bd10ba24009328de063e473b30ef0fd2437f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page