spark_quality_rules_tools
Project description
spark_quality_rules_tools
spark_quality_rules_tools is a Python library that implements quality rules in sandbox
Installation
The code is packaged for PyPI, so that the installation consists in running:
Usage
wrapper run hammurabies
Sandbox
Installation
!yes| pip uninstall spark-quality-rules-tools
pip install spark-quality-rules-tools --user --upgrade
IMPORTS
import os
import pyspark
from spark_quality_rules_tools import dq_path_workspace
from spark_quality_rules_tools import dq_download_jar
from spark_quality_rules_tools import dq_spark_session
from spark_quality_rules_tools import dq_extract_parameters
from spark_quality_rules_tools import dq_run_sandbox
from spark_quality_rules_tools import dq_validate_conf
from spark_quality_rules_tools import dq_validate_rules
from spark_quality_rules_tools import show_spark_df
pyspark.sql.dataframe.DataFrame.show2 = show_spark_df
Variables
project_sda="SDA_37036"
url_conf = "http://artifactory-gdt.central-02.nextgen.igrupobbva/artifactory/gl-datio-spark-libs-maven-local/com/datiobd/cdd-hammurabi/4.0.9/DQ_LOCAL_CONFS/KCOG/KCOG_branch_MRField.conf"
Creating Workspace
dq_path_workspace(project_sda=project_sda)
Download haas jar
dq_download_jar(haas_version="4.8.0", force=True)
Spark Session
spark, sc = dq_spark_session()
Validate Conf
dq_validate_conf(url_conf=url_conf)
Extract Params
dq_extract_parameters(url_conf=url_conf)
Json params
parameter_conf_list = [
{
"ARTIFACTORY_UNIQUE_CACHE": "http://artifactory-gdt.central-02.nextgen.igrupobbva",
"ODATE_DATE": "2022-11-11",
"COUNTRY_ID": "PE",
"SCHEMA_PATH": "t_kcog_branch.output.schema",
"CUTOFF_DATE": "2022-11-11",
"SCHEMAS_REPOSITORY": "gl-datio-da-generic-local/schemas/pe/kcog/master/t_kcog_branch/latest/"
}
]
Run
dq_run_sandbox(spark=spark,
sc=sc,
parameter_conf_list=parameter_conf_list,
url_conf=url_conf)
df = spark.read.csv("file:/var/sds/homes/P030772/workspace/data_quality_rules/data_reports/KCOG/KCOG_BRANCH_MRFIELD_202304120046_20221111.csv",
header=True)
df.show2(100)
Run
dq_validate_rules(url_conf=url_conf)
License
New features v1.0
BugFix
- choco install visualcpp-build-tools
Reference
- Jonathan Quiza github.
- Jonathan Quiza RumiMLSpark.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spark_quality_rules_tools-0.9.11.tar.gz
.
File metadata
- Download URL: spark_quality_rules_tools-0.9.11.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28294c01d380d46a00092bbf58214211a9e1bba355ebd25ff42111e0076046aa |
|
MD5 | 30a018c1d0f75a0ee94367dfb0c1fa77 |
|
BLAKE2b-256 | 3cfb3e942846b081b93e9e4eca7d8a7140525fdc1f770de3eff5ffb36cdd6f8c |
File details
Details for the file spark_quality_rules_tools-0.9.11-py3-none-any.whl
.
File metadata
- Download URL: spark_quality_rules_tools-0.9.11-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3db802720142b523a68976df4c688bed5844dec5d4d58ab2766cc60f9e2b6e76 |
|
MD5 | 8c32052e43d77f1c3e565c112c325a29 |
|
BLAKE2b-256 | d17cee4ee0bfdf86a70be985af9c72f0c3b538a12e11aad5347c94ce58808fda |