Validation library for Pandas Dataframe
Project description
pandas-validity
What is it?
pandas-validity is a Python library for the validation of pandas DataFrames. It provides a DataFrameValidator
class that serves as a context manager. Within this context, you can perform multiple validations and checks. Any encountered errors are collected and raised at the end of the process. The DataFrameValidator
raises a ValidationErrorsGroup
exception to summarize the errors.
Installation
You can easily install the latest released version using binary installers from the Python Package Index (PyPI):
pip install pandas-validity
Development Installation
Prerequisites: poetry for environment management
The source code is currently hosted on GitHub at ohmycoffe/pandas-validity. To get the development version:
git clone git@github.com:ohmycoffe/pandas-validity.git
To install the project and development dependencies:
make install
To run tests:
make test
To view all possible commands, use:
make help
Usage
import pandas as pd
import datetime
from pandas_validity import DataFrameValidator
# Create a sample DataFrame
df = pd.DataFrame(
{
"A": [1, 2, 3],
"B": ["a", None, "c"],
"C": [2.3, 4.5, 9.2],
"D": [
datetime.datetime(2023, 1, 1, 1),
datetime.datetime(2023, 1, 1, 2),
datetime.datetime(2023, 1, 1, 3),
],
}
)
# Define your expectations and data type mappings
expected_columns = ['A', 'B', 'C', 'E']
data_types_mapping = {
"A": 'float',
"D": 'datetime'
}
# Use DataFrameValidator for validation
with DataFrameValidator(df) as validator:
validator.is_empty()
validator.has_required_columns(expected_columns)
validator.has_no_redundant_columns(expected_columns)
validator.has_valid_data_types(data_types_mapping)
validator.has_no_missing_data()
Output:
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has missing columns: ['E']
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has redundant columns: ['D']
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Column 'A' has an invalid data type: 'int64'
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]
+ Exception Group Traceback (most recent call last):
...
| pandas_validity.exceptions.ValidationErrorsGroup: Validation errors found: 4. (4 sub-exceptions)
+-+---------------- 1 ----------------
| pandas_validity.exceptions.ValidationError: The dataframe has missing columns: ['E']
+---------------- 2 ----------------
| pandas_validity.exceptions.ValidationError: The dataframe has redundant columns: ['D']
+---------------- 3 ----------------
| pandas_validity.exceptions.ValidationError: Column 'A' has an invalid data type: 'int64'
+---------------- 4 ----------------
| pandas_validity.exceptions.ValidationError: Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]
+------------------------------------
The library supports the following data types for validation:
- predefined:
"str"
,"int"
,"float"
,"datetime"
,"bool"
- or any
Callable
that accepts a datatype/dtype
object and returns a boolean value to indicate the validation status - example:pd.api.types.is_string_dtype
Development
Prerequisites: poetry for environment management
The source code is currently hosted on GitHub at: https://github.com/ohmycoffe/pandas-validity
git clone git@github.com:ohmycoffe/pandas-validity.git
To install the project and development dependencies:
make install
To run tests:
make test
To view all possible commands, use:
make
License
This project is licensed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pandas_validity-0.1.1.tar.gz
.
File metadata
- Download URL: pandas_validity-0.1.1.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.7 Linux/5.15.90.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51db9fc1121cb9a9c22fc6bf08bfc71e52398f21b5d9ab516f6bb684a22a95d1 |
|
MD5 | 87e80c8c3480f885063d2eba25cd046d |
|
BLAKE2b-256 | 8b435c62c45801b4caa25976f5376db1fdce0565c4d1d9de9786a193204127a2 |
File details
Details for the file pandas_validity-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pandas_validity-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.7 Linux/5.15.90.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eec0ed82eeae0894c34e61e3f5c55542cd07fadc1ce5b6ed1a4cc7c801bce8c8 |
|
MD5 | d740ce8743e345e8e5f28ce57e09eaea |
|
BLAKE2b-256 | b75443f6405c10b64363e6dea92082fa226003aefad03779015f4b255d7d4aee |