Skip to main content

A lightweight data contracts library

Project description

🔍 Wimsey

Codeberg PyPi

Docs License: MIT coverage Awesome Downloads

Wimsey is lightweight, flexible and fully open-source data contract library.

  • 🐋 Bring your own dataframe library: Built on top of Narwhals so your tests are carried out natively in your own dataframe library (including Pandas, Polars, Pyspark, Dask, DuckDB, CuDF, Rapids, Arrow and Modin)
  • 🎍 Bring your own contract format: Write contracts in yaml, json or python - whichever you prefer!
  • 🪶 Ultra Lightweight: Built for fast imports and minimal overwhead with only two dependencies (Narwhals and FSSpec)
  • 🥔 Simple, easy API: Low mental overheads with two simple functions for testing dataframes, and a simple dataclass for results.

Check out the handy test catalogue and quick start guide

What is a data contract?

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data - we might want to check that the only columns that exist are first_name, last_name and rating, or we might want to check that rating is a number less than 10.

Wimsey let's you write contracts in json, yaml or python, here's how the above checks would look in yaml:

- test: columns_should
  be:
    - first_name
    - last_name
    - rating
- column: rating
  test: max_should
  be_less_than_or_equal_to: 10

Wimsey then can execute tests for you in a couple of ways, validate - which will throw an error if tests fail, and otherwise pass back your dataframe - and test, which will give you a detailed run down of individual test success and fails.

Validate is designed to work nicely with polars or pandas pipe methods as a handy guard:

import polars as pl
import wimsey

df = (
  pl.read_csv("hopefully_nice_data.csv")
  .pipe(wimsey.validate, "tests.json")
  .group_by("name").agg(pl.col("value").sum())
)

Test is a single function call, returning a FinalResult data-type:

import pandas as pd
import wimsey

df = pd.read_csv("hopefully_nice_data.csv")
results = wimsey.test(df, "tests.yaml")

if results.success:
  print("Yay we have good data! 🥳")
else:
  print(f"Oh nooo, something's up! 😭")
  print([i for i in results.results if not i.success])

Roadmap, Contributing & Feedback

Wimsey's mirrored on github, but hosted and developed on codeberg. Issues and pull requests are accepted on both.

Focus at the moment is on refining profiling and test generation, if you have tests or feature that would be helpful to you, feel free to reach out!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wimsey-1.0.2.tar.gz (104.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wimsey-1.0.2-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file wimsey-1.0.2.tar.gz.

File metadata

  • Download URL: wimsey-1.0.2.tar.gz
  • Upload date:
  • Size: 104.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wimsey-1.0.2.tar.gz
Algorithm Hash digest
SHA256 5220b61efe6038e2c43ba961dcc3d64773a00dc147eec03f4d92d2e5f93b9dde
MD5 b8ab9d8dde5af82d535ee04b4e6f2854
BLAKE2b-256 c66cc0288f8f169778aac841b4c5e0136301d5628435ff791000c9ea5c503241

See more details on using hashes here.

File details

Details for the file wimsey-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: wimsey-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wimsey-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ca03286f98b788243392bde0cc98e70cf696723397415bb7ecaf8bcee2ebeaad
MD5 d1c484087ffdf4caecf9cd560bfc98b1
BLAKE2b-256 da6c3e5cdb3eb5c4ee507440ed30c9a57c38a33110ab09aa51cc953b2ca62aba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page