Skip to main content

A lightweight data contracts framework

Project description

Wimsey 🔍

PyPI version License

A lightweight and flexible data contract library.

Wimsey is designed a very lightweight data contracts library, simlar to great-expections or soda-core, that is built on top of Narwhals. It is designed to have minimal import times and dependencies.

What is a data contract?

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data, such as that it should 'only have columns x and y' or 'the values of column a should never exceed 1'. Wimsey is a library built to run these contracts on a dataframe during python runtime.

Quick Demo

Let's start by taking a look at an example data contract, Wimsey supports reading json or yaml files, or just plain old python dictionaries. Here's an example of a yaml contract:

- column: awesome_column
  test: mean_should
  be_greater_than: -10
  be_less_than: 100
- column: another_great_column
  test: null_count_should
  be_exactly: 0
- test: row_count_should
  be_less_than_or_equal_to: 50000
- column: neato_column
  test: type_should
  be_one_of:
    - int64
    - float64

Note you'll need pyyaml installed to support reading this, the same data can be stored as json without needing extension if you're trying to keep things lightweight

Here we have two tests, firstly, we're checking that "awesome_column" is between -10 and 100, and then we're checking that "another_great_column" has no null entries.

In terms of using the Wimsey libary, there's essentially only two functions you'll need, validate and/or test.

Because Wimsey uses Narwhals under the hood, you can run these tests directly on your dataframe library of choice (pandas, polars, dask etc) as long as it's supported via Narwhals. Here's an example of using "validate" with pandas, which will throw an exception if tests fail, and otherwise pass back your data frame so you can continue happily:

import pandas as pd
import wimsey

df = (
  pd.read_csv("hopefully_nice_data.csv")
  .pipe(wimsey.validate, "tests.json")
  .groupby(["name", "type"]).sum()
)

Similarly, here's an example with polars, but instead using test, which will return a final_results object with a success boolean.

import polars as pl
import wimsey


df = pl.read_csv("hopefully_nice_data.csv")
results = wimsey.test(df, "tests.yaml")
if results.success:
  print("Yay we have good data! 🥳")
else:
  print(f"Oh nooo, something up! 😭")
  print(results)

Project Status

Wimsey is veeeery, veeerrrry early, there's a very small amount of supported tests, and even less documentation. Feedback, contributions and requests are all welcome!

Comparison

Tool Import Time PyPi Size Dependencies Has a GUI Framework
Great Expectations 2.7 seconds 5367KB 25 Yes
Soda Core 0.4 seconds 145KB 11 Yes (non open source)
Wimsey 0.02 seconds 6KB 2 No

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wimsey-0.2.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

wimsey-0.2.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file wimsey-0.2.0.tar.gz.

File metadata

  • Download URL: wimsey-0.2.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f056971c87a4d8c1d4a8c4d2b82cdbe3855975f2aff4cac6084d629210b9a101
MD5 34cba496f5b5f88cfe7a17962a5b2dfe
BLAKE2b-256 dfac71d6344d591aed69d43ef4a3d9c37f10373e906a789e584d5a49d916bfe3

See more details on using hashes here.

File details

Details for the file wimsey-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: wimsey-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f532ba2ac4d00d0d03eedd6ffdff394f92572cfdbecc4341a462baaaaf1e3a85
MD5 d9eb8c9dc540e98d150153778ae750d7
BLAKE2b-256 9c65c638de6b8fe57ca1fba15cde4d02cd1e2feced94ad3c502c66c648e46429

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page