Skip to main content

A lightweight data contracts framework

Reason this release was yanked:

bug causing import failure

Project description

Wimsey 🔍

A lightweight and flexible data contract library.

Wimsey is designed a very lightweight data contracts library, simlar to great-expections or soda-core, that is built on top of Narwhals. It is designed to have minimal import times and dependencies.

What is a data contract?

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data, such as that it should 'only have columns x and y' or 'the values of column a should never exceed 1'. Wimsey is a library built to run these contracts on a dataframe during python runtime.

Quick Demo

Let's start by taking a look at an example data contract, Wimsey supports reading json or yaml files, or just plain old python dictionaries. Here's an example of a yaml contract (note you'll need pyyaml installed to support reading this):

- column: awesome_column
  test: mean_should_be
  greater_than: -10
  less_than: 100
- column: another_great_column
  test: null_count_should_be
  exactly: 0

Here we have two tests, firstly, we're checking that "awesome_column" is between -10 and 100, and then we're checking that "another_great_column" has no null entries.

In terms of using the Wimsey libary, there's essentially only two functions you'll need, validate and/or test.

Because Wimsey uses Narwhals under the hood, you can run these tests directly on your dataframe library of choice (pandas, polars, dask etc) as long as it's supported via Narwhals. Here's an example of using "validate" with pandas, which will throw an exception if tests fail, and otherwise pass back your data frame so you can continue happily:

import pandas as pd
import wimsey

df = (
  pd.read_csv("hopefully_nice_data.csv")
  .pipe(wimsey.validate, "tests.json")
  .groupby(["name", "type"]).sum()
)

Similarly, here's an example with polars, but instead using test, which will return a final_results object with a success boolean.

import polars as pl
import wimsey


df = pl.read_csv("hopefully_nice_data.csv")
results = wimsey.test(df, "tests.yaml")
if results.success:
  print("Yay we have good data! 🥳")
else:
  print(f"Oh nooo, something up! 😭")
  print(results)

Project Status

Wimsey is veeeery, veeerrrry early, there's a very small amount of supported tests, and even less documentation. Feedback, contributions and requests are all welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wimsey-0.1.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

wimsey-0.1.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file wimsey-0.1.0.tar.gz.

File metadata

  • Download URL: wimsey-0.1.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1e0bb4366c2aefbd334c38ea9307ca6b156b221e5186143e500b8fd529035993
MD5 93ddf1ffc6d7cbaeafb5ccb103500fe1
BLAKE2b-256 fde315baeaabc7f8dea0c880fbb8f79485bbedeef77921c6857bd0b0402eb5fd

See more details on using hashes here.

File details

Details for the file wimsey-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wimsey-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5c502e09c02404f20fc120c96f4cb612fdd90690abab8134bff1583b9c7ea65
MD5 1428ef23d17f452ac09fe6aac3ffdbc7
BLAKE2b-256 ea8520e47b3fefd1d0e93819037935225437aea657869a58d8ac8071b269f3a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page