Testframework for PySpark DataFrames
Project description
pyspark-testframework
⏳ Work in progress
The goal of the pyspark-testframework
is to provide a simple way to create tests for PySpark DataFrames. The test results are returned in DataFrame format as well.
Example
Input DataFrame:
primary_key | number | |
---|---|---|
1 | info@woonstadrotterdam.nl | 123 |
2 | infowoonstadrotterdam.nl | 01 |
3 | @woonstadrotterdam.nl | -45 |
4 | dev@woonstadrotterdam.nl | 1.0 |
5 | Null | Null |
from testframework.tests import RegexTest, IsIntegerString
# test for valid email addresses
email_regex = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
mail_tester = RegexTest(
name="ValidEmail",
pattern=email_regex
)
test_result_email = mail_tester.test(
df=df,
col="email",
primary_key="primary_key",
nullable=False
)
# test for integer strings
integer_string_tester = IsIntegerString()
test_result_number = number_tester.test(
df=df,
col="number",
primary_key="primary_key",
nullable=True
)
test_result_email.show()
test_result_number.show()
Output for ValidEmail:
primary_key | email__ValidEmail | |
---|---|---|
1 | info@woonstadrotterdam.nl | True |
2 | infowoonstadrotterdam.nl | False |
3 | @woonstadrotterdam.nl | False |
4 | dev@woonstadrotterdam.nl | True |
5 | Null | False |
Output for IsIntegerString:
primary_key | number | number__IsIntegerString |
---|---|---|
1 | 123 | True |
2 | 01 | False |
3 | -45 | True |
4 | 1.0 | True |
5 | Null | True |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pyspark_testframework-1.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fb4e06795e6d39d982d58fe9122d30c1bdd977bb458a3e93811a181b06e5dda |
|
MD5 | e25199a5ea858c9528566108a9cb2c71 |
|
BLAKE2b-256 | 1916560e8c953173fbca021c7ae0711c818d0ce91acff66e41d9d3e4e94000c7 |
Close
Hashes for pyspark_testframework-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee18659d3d41abbb591c214434fc5f1fc4f46e542aab4092f3cb9c1a29d23b26 |
|
MD5 | 84c27f2635816679e4e221ce048dc3af |
|
BLAKE2b-256 | 4eb256abaf003e345b9484bbf3ecce8ee744b856db5a356e0520b24f8ff8604a |