Skip to main content

A lightweight data contracts framework

Project description

Wimsey 🔍

PyPI version License Static Badge

A lightweight, flexible and fully open-source data contract library.

  • 🐋 Bring your own dataframe library: Built on top of Narwhals so your tests are carried out natively in your own dataframe library (including Pandas, Polars, Dask, CuDF, Rapids, Arrow and Modin)
  • 🎍 Bring your own contract format: Write contracts in yaml, json or python - whichever you prefer!
  • 🪶 Ultra Lightweight: Built for fast imports and minimal overwhead with only two dependencies (Narwhals and FSSpec)
  • 🥔 Simple, easy API: Low mental overheads with two simple functions for testing dataframes, and a simple dataclass for results.

Check out the handy test catalogue and quick start guide

What is a data contract?

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data - we might want to check that the only columns that exist are first_name, last_name and rating, or we might want to check that rating is a number less than 10.

Wimsey let's you write contracts in json, yaml or python, here's how the above checks would look in yaml:

- test: columns_should
  be:
    - first_name
    - last_name
    - rating
- column: rating
  test: max_should
  be_less_than_or_equal_to: 10

Wimsey then can execute tests for you in a couple of ways, validate - which will throw an error if tests fail, and otherwise pass back your dataframe - and test, which will give you a detailed run down of individual test success and fails.

Validate is designed to work nicely with polars or pandas pipe methods as a handy guard:

import polars as pl
import wimsey

df = (
  pl.read_csv("hopefully_nice_data.csv")
  .pipe(wimsey.validate, "tests.json")
  .group_by("name").agg(pl.col("value").sum())
)

Test is a single function call, returning a FinalResult data-type:

import pandas as pd
import wimsey

df = pd.read_csv("hopefully_nice_data.csv")
results = wimsey.test(df, "tests.yaml")

if results.success:
  print("Yay we have good data! 🥳")
else:
  print(f"Oh nooo, something's up! 😭")
  print([i for i in results.results if not i.success])

Roadmap, Contributing & Feedback

Wimsey is very new! There's a lot more to come soon in the form of additional available data tests, better test coverage, performance improvements and friendly error messages. Once the fundamentals are polished, next up is developing a handy API for "data profiling" (generate minimal tests from a sample of data).

Wimsey is ready to mingle! If you have ideas or feedback, including additional tests you'd want to see, please feel free to raise an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wimsey-0.3.1.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

wimsey-0.3.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file wimsey-0.3.1.tar.gz.

File metadata

  • Download URL: wimsey-0.3.1.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for wimsey-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f0b753d0d3b9e6ac126b96022daded76bd1b8b04b3dc084e5dec51f22fb5d1fd
MD5 070fd9651ac485200e172e5bdb1b2780
BLAKE2b-256 265fed671ad41f1f829bee8e9e9c073e4de7189153a7c963db3b4d1626086e1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for wimsey-0.3.1.tar.gz:

Publisher: release.yml on benrutter/wimsey

Attestations:

File details

Details for the file wimsey-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: wimsey-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for wimsey-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4534a9e0d1e59b173f4f9a91e48f5c22da24cd5b5423b088a023fdaecde3a11c
MD5 16f17d9a92a44b392ecb54c75d5c75af
BLAKE2b-256 bfccc036181bf216775d49f7909da229b7da6a6c69646024f4b810c0131b3201

See more details on using hashes here.

Provenance

The following attestation bundles were made for wimsey-0.3.1-py3-none-any.whl:

Publisher: release.yml on benrutter/wimsey

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page