A python package that validates datasets against a metadata schema
Project description
data_linter
A python package that validates datasets against a metadata schema which is defined here.
It performs the following checks:
- Are the columns of the correct data types (or can they be converted without error using
pd.Series.astype
in the case of untyped data formats likecsv
) - Column names:
- Are the columns named correctly?
- Are they in the same order specified in the meta data
- Are there any missing columns?
- Where a regex
pattern
is provided in the metadata, does the actual data always fit thepattern
- Where an
enum
is provided in the metadata, does the actual data contain only values in theenum
- Where
nullable
is set to false in the metadata, are there really no nulls in the data?
The package also provides functionality to impose_metadata_types_on_pd_df
, which allows the user to safely convert a pandas dataframe to the datatypes specified in the metadata. This is useful in the case you have an untyped data file such as a csv
and want to ensure it is conformant with the metadata.
Usage
For detailed information about how to use the package, please see the demo repo. This includes an interactive tutorial that you can run in your web browser.
Here's a very basic example
import pandas as pd
import json
from data_linter.lint import Linter
def read_json_from_path(path):
with open(path) as f:
return_json = json.load(f)
return return_json
meta = read_json_from_path("tests/meta/test_meta_cols_valid.json")
df = pd.read_parquet("tests/data/test_parquet_data_valid.parquet")
l = Linter(df, meta)
l.check_all()
l.markdown_report()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_linter-0.1.0.tar.gz
.
File metadata
- Download URL: data_linter-0.1.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.6.9 Darwin/18.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a4ea28e01c8459189051a778c9779fed85256f44c318f429f634a93d01110fb |
|
MD5 | 3c6f620617f15365d59c9a6a71eccac3 |
|
BLAKE2b-256 | c31e5a0a2c964d2fa07b7a7d855c406eef9b36a03fefc85d01ff3b880b03a725 |
File details
Details for the file data_linter-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: data_linter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.6.9 Darwin/18.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea4920005c62f89a3daaa4c3357dfe64e0373ec321c8f14b39257fdf0a0d2139 |
|
MD5 | 1f29ac40c43eac4b8dd74ffaec860e65 |
|
BLAKE2b-256 | 49a5a0393051dfb52b00fd3df67717788ae0a4892a285270b4c56f39a986cbca |