Skip to main content

A lightweight data contracts framework

Project description

Wimsey 🔍

PyPI version

A lightweight and flexible data contract library.

Wimsey is designed a very lightweight data contracts library, simlar to great-expections or soda-core, that is built on top of Narwhals. It is designed to have minimal import times and dependencies.

What is a data contract?

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data, such as that it should 'only have columns x and y' or 'the values of column a should never exceed 1'. Wimsey is a library built to run these contracts on a dataframe during python runtime.

Quick Demo

Let's start by taking a look at an example data contract, Wimsey supports reading json or yaml files, or just plain old python dictionaries. Here's an example of a yaml contract (note you'll need pyyaml installed to support reading this):

- column: awesome_column
  test: mean_should_be
  greater_than: -10
  less_than: 100
- column: another_great_column
  test: null_count_should_be
  exactly: 0

Here we have two tests, firstly, we're checking that "awesome_column" is between -10 and 100, and then we're checking that "another_great_column" has no null entries.

In terms of using the Wimsey libary, there's essentially only two functions you'll need, validate and/or test.

Because Wimsey uses Narwhals under the hood, you can run these tests directly on your dataframe library of choice (pandas, polars, dask etc) as long as it's supported via Narwhals. Here's an example of using "validate" with pandas, which will throw an exception if tests fail, and otherwise pass back your data frame so you can continue happily:

import pandas as pd
import wimsey

df = (
  pd.read_csv("hopefully_nice_data.csv")
  .pipe(wimsey.validate, "tests.json")
  .groupby(["name", "type"]).sum()
)

Similarly, here's an example with polars, but instead using test, which will return a final_results object with a success boolean.

import polars as pl
import wimsey


df = pl.read_csv("hopefully_nice_data.csv")
results = wimsey.test(df, "tests.yaml")
if results.success:
  print("Yay we have good data! 🥳")
else:
  print(f"Oh nooo, something up! 😭")
  print(results)

Project Status

Wimsey is veeeery, veeerrrry early, there's a very small amount of supported tests, and even less documentation. Feedback, contributions and requests are all welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wimsey-0.1.2.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

wimsey-0.1.2-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file wimsey-0.1.2.tar.gz.

File metadata

  • Download URL: wimsey-0.1.2.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.1.2.tar.gz
Algorithm Hash digest
SHA256 eec11bda24db617f8774234be0a53e69427739ea2283171f4aad3b23f09f4e95
MD5 2e087a226856be993d6947ffe092667d
BLAKE2b-256 80ba532b12bd0a9a062e918c897efc7b3bff3b6da1d0285de86b885122089e58

See more details on using hashes here.

File details

Details for the file wimsey-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: wimsey-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ebe8ff07f8c99d8e094a32e744d3a80eb751cd2f6a431a7609f92ae1f347567
MD5 848810c17f1b68cf4be67b42eb04e4c2
BLAKE2b-256 9d45d0424915378aa060de3168a9a698870c533b043a6063e732416b24dedf7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page