Skip to main content

A way to use Great Expectations in Google Collab and other notebook environments.

Project description

GXKent

A simple library that allows Great Expectations to run easily in python notebook and CLI environments

Idea

Kent was the city featured in the Charles Dickens classic, and is therefore the sensible name for a container of expectations The central issue that Kent resolves is to ensure that pandas dataframes are available and populated with data in both of our data contexts: CLI and Notebooks

Basic Usage

from gxkent import GXKent

Kent = GXKent() # Databricks usage
# Google Colab usage
# Kent = GXKent(password_worksheet='your_worksheet_name_here') 
# Command line usage
# Kent = GXKent(env_path='/where/you/put/your/.env') 

Kent.is_print_on_success = False

  sql_text =
SELECT count(DISTINCT a.npi) AS new_npi_cnt
FROM default_npi_setting_count.{table_name} a
WHERE a.npi NOT IN (
    SELECT DISTINCT b.npi
    FROM default_npi_setting_count.persetting_2021_12 b
    );

gxDF = Kent.gx_df_from_sql(sql_text)

Kent.capture_expectation(
    expectation_name='Between year comparision {this_year} {that_year}',
    expectation_result=gxDF.expect_column_max_to_be_between('new_npi_cnt',112671,253511)
)

Kent.capture_expectation(
    expectation_name='Between year comparision {this_year} {that_year}',
    expectation_result=gxDF.expect_column_min_to_be_between('new_npi_cnt',11671,23511)
)

Kent.capture_expectation(
    expectation_name='Between year comparision {this_year} {that_year}',
    expectation_result=gxDF.expect_column_avg_to_be_between('new_npi_cnt',50000,60000)
)

# Prints the results to the console! 
Kent.print_test_results()

CLI Usage

In order to work from the command line, there should be a .env file with database credentials in it. As typical .env files should be excluded in your .gitignore file.

here is the contents expected in the .env file:

GX_USERNAME=your_gx_mysql_username
GX_PASSWORD=your_gx_mysql_password
DB_DATABASE=starting_database
DB_PORT=3306
DB_HOST=localhost

Once your .env file has the right contents, you need to tell GXkent where it lives when you create your object with:

Kent = GXKent(env_path='/where/you/put/your/.env')

substitute your database connection details here. For now, GXKent only supports MySQL. patches to use sqlalchemy properly to support other databases are welcome.

Google Colab Usage

In order to safely use Google Collab notebooks, it is critical to not save your password credentials in the notebook itself. Instead you should store your credentials in a google spreadsheet and then connect your Collab notebook to that spreadsheet.

In order to use GXKent in this way, you need to pass in the credentials like so:

Kent = GXKent(password_worksheet='your_worksheet_name_here')

Your Google Drive Spreadsheet should contain the following structure:

username password server port database
your_username your_password your_server 3306 your_database

Authors

Fred Trotter and Jose Cortina

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gxkent-0.2.7.tar.gz (12.9 kB view hashes)

Uploaded Source

Built Distribution

gxkent-0.2.7-py3-none-any.whl (13.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page