Skip to main content

A lightweight data contracts framework

Reason this release was yanked:

more-bugs!

Project description

Wimsey 🔍

PyPI version

A lightweight and flexible data contract library.

Wimsey is designed a very lightweight data contracts library, simlar to great-expections or soda-core, that is built on top of Narwhals. It is designed to have minimal import times and dependencies.

What is a data contract?

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data, such as that it should 'only have columns x and y' or 'the values of column a should never exceed 1'. Wimsey is a library built to run these contracts on a dataframe during python runtime.

Quick Demo

Let's start by taking a look at an example data contract, Wimsey supports reading json or yaml files, or just plain old python dictionaries. Here's an example of a yaml contract (note you'll need pyyaml installed to support reading this):

- column: awesome_column
  test: mean_should_be
  greater_than: -10
  less_than: 100
- column: another_great_column
  test: null_count_should_be
  exactly: 0

Here we have two tests, firstly, we're checking that "awesome_column" is between -10 and 100, and then we're checking that "another_great_column" has no null entries.

In terms of using the Wimsey libary, there's essentially only two functions you'll need, validate and/or test.

Because Wimsey uses Narwhals under the hood, you can run these tests directly on your dataframe library of choice (pandas, polars, dask etc) as long as it's supported via Narwhals. Here's an example of using "validate" with pandas, which will throw an exception if tests fail, and otherwise pass back your data frame so you can continue happily:

import pandas as pd
import wimsey

df = (
  pd.read_csv("hopefully_nice_data.csv")
  .pipe(wimsey.validate, "tests.json")
  .groupby(["name", "type"]).sum()
)

Similarly, here's an example with polars, but instead using test, which will return a final_results object with a success boolean.

import polars as pl
import wimsey


df = pl.read_csv("hopefully_nice_data.csv")
results = wimsey.test(df, "tests.yaml")
if results.success:
  print("Yay we have good data! 🥳")
else:
  print(f"Oh nooo, something up! 😭")
  print(results)

Project Status

Wimsey is veeeery, veeerrrry early, there's a very small amount of supported tests, and even less documentation. Feedback, contributions and requests are all welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wimsey-0.1.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

wimsey-0.1.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file wimsey-0.1.1.tar.gz.

File metadata

  • Download URL: wimsey-0.1.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.1.1.tar.gz
Algorithm Hash digest
SHA256 44475453bd03b49a0a5565146f3527ef34af214d707e4fb060eaaf2defff4069
MD5 e60d581fa3faf8725e37aa5a024c2169
BLAKE2b-256 34f23fd4dd95d75f71baac14cf9ead52bb96b4e7f38ec7a0ad8fce935e3dc787

See more details on using hashes here.

File details

Details for the file wimsey-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: wimsey-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for wimsey-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3deef26f66c85276a2f0e4b252dc6d45ef870c7ca00d16b332639e3dec253ba5
MD5 36a1fcec472efd7f58b3eeba6d746ac1
BLAKE2b-256 1d4c0c630b5ed02a759bb4b5e35ce34c7d02cad073919ae41b43b103d098ac21

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page