Testframework for PySpark DataFrames
Project description
pyspark-testframework
⏳ Work in progress
The goal of the pyspark-testframework
is to provide a simple way to create tests for PySpark DataFrames. The test results are returned in DataFrame format as well.
Example
Input DataFrame:
primary_key | number | |
---|---|---|
1 | info@woonstadrotterdam.nl | 123 |
2 | infowoonstadrotterdam.nl | 01 |
3 | @woonstadrotterdam.nl | -45 |
4 | dev@woonstadrotterdam.nl | 1.0 |
5 | Null | Null |
from testframework.tests import RegexTest, IsIntegerString
# test for valid email addresses
email_regex = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
mail_tester = RegexTest(
name="ValidEmail",
pattern=email_regex
)
test_result_email = mail_tester.test(
df=df,
col="email",
nullable=False
)
# test for integer strings
integer_string_tester = IsIntegerString()
test_result_number = number_tester.test(
df=df,
col="number",
nullable=True
)
test_result_email.show()
test_result_number.show()
Output for ValidEmail:
primary_key | email__ValidEmail |
---|---|
1 | True |
2 | False |
3 | False |
4 | True |
5 | False |
Output for IsIntegerString:
primary_key | number__IsIntegerString |
---|---|
1 | True |
2 | False |
3 | True |
4 | True |
5 | True |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pyspark_testframework-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fb56d910c074c8c1866a41af8cd79a3e11beedc87e3fabd005ac8b5679c3f74 |
|
MD5 | be648f5561899d256b5b77975a0db4f4 |
|
BLAKE2b-256 | 1e8b22ccf5164de80f6abdfa1e3c4aef5f64d64caeed65e39550bb0b3e0fb891 |
Close
Hashes for pyspark_testframework-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f579a7969729906489e702490707ed5ea3ecd520f83b79376ce7bb4e4e51c893 |
|
MD5 | abbb336adceb6325e82a08fd88e225e5 |
|
BLAKE2b-256 | 65299bc8caf08c412f7f7d5de931525add726d2e331a5a43604c2ac8bb7e6461 |