Testframework for PySpark DataFrames
Project description
pyspark-testframework
⏳ Work in progress
The goal of the pyspark-testframework
is to provide a simple way to create tests for PySpark DataFrames. The test results are returned in DataFrame format as well.
Example
Input DataFrame:
primary_key | number | |
---|---|---|
1 | info@woonstadrotterdam.nl | 123 |
2 | infowoonstadrotterdam.nl | 01 |
3 | @woonstadrotterdam.nl | -45 |
4 | dev@woonstadrotterdam.nl | 1.0 |
5 | Null | Null |
from testframework.tests import RegexTest, IsIntegerString
# test for valid email addresses
email_regex = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
mail_tester = RegexTest(
name="ValidEmail",
pattern=email_regex
)
test_result_email = mail_tester.test(
df=df,
col="email",
primary_key="primary_key",
nullable=False
)
# test for integer strings
integer_string_tester = IsIntegerString()
test_result_number = number_tester.test(
df=df,
col="number",
primary_key="primary_key",
nullable=True
)
test_result_email.show()
test_result_number.show()
Output for ValidEmail:
primary_key | email__ValidEmail | |
---|---|---|
1 | info@woonstadrotterdam.nl | True |
2 | infowoonstadrotterdam.nl | False |
3 | @woonstadrotterdam.nl | False |
4 | dev@woonstadrotterdam.nl | True |
5 | Null | False |
Output for IsIntegerString:
primary_key | number | number__IsIntegerString |
---|---|---|
1 | 123 | True |
2 | 01 | False |
3 | -45 | True |
4 | 1.0 | True |
5 | Null | True |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pyspark_testframework-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c100ddbf90eaa853d59344015c2befc2da03f402dfb7f1a85aac2b9de9abfe3 |
|
MD5 | 03841c3d50e46a1fd2455f729d3d7043 |
|
BLAKE2b-256 | f4c45a31d7802f10b988fbe0af741dcc45b018c4f12569a9e967f071dd2905a7 |
Close
Hashes for pyspark_testframework-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ce424a3eaba470ce48ff88efb2e29182aab3895812ce524a5b4c38049eeb603 |
|
MD5 | 9f3b8e5bb38daa018046a71abe97bd0f |
|
BLAKE2b-256 | acf4c3248f9b40edca079db6a4ff902f5ade337766fb6819ade97e4cc0498593 |