Skip to main content

"improve how we work with evolving data"

Project description

datapact - pytest, but for dataframes

All Contributors

codecov Pypi Test

datapact is a Python library for verifying your data.

import datapact

dp = datapact.test(df)

dp.age.must.be_positive()
dp.name.should.not_be_empty()

It works with Pandas + Dask DataFrames, and has special support for Jupyter Notebooks.

jupyter notebooks screenshot

Here's some features:

  • dozens of existing assertions, easy to add your own
  • great in-editor documentation via docstrings + types
  • two severence levels (.should for warnings, .must for failures)
  • failure notifications via E-Mail, MS Teams, Slack or PagerDuty (via Datapact Track)

Get Started here: https://datapact.dev

Datapact Track

Datapact Track is an optional, browser-based data tracking service.

Datapact Track dataset overview. code snippet for how to connect test suite to service

It's fully self-hostable via Docker and Postgres, and there's a hosted version available at track.datapact.dev.

Connecting your test suite is one line of code:

dp.connect(
  server="track.datapact.dev",
  token="..." # get this from the UI
)

Datapact track gives you:

  • notifications via E-Mail, Slack, MS Teams and PagerDuty
  • a central documentation of your datasets
  • history of data expectations + reality
  • data quality tracking

Try out Datapact Track at track.datapact.dev, or follow the self-hosting guide to deploy your own instance.

datapact vs Great Expectations

Both datapact and Great Expectations help you improve Data Quality, but with a different approach.

Great Expectations has its own JSON-based storage format for expectation suites, and it gives you a custom UI to edit them. It's way bigger than datapact - in project size, project scope, but also in complexity.

datapact is a lot younger, community-run, and more of a library than a framework. The main differentiator is that it allows you to express your test suites in Python code, right along your other code. This works in Python Scripts, Jupyter Notebooks, Pipeline Tests - everywhere that Python runs. And by having your tests in code, you can co-locate them with the rest of your code, and version control + review them just like all of it.

If you already know how to use Great Expectations, you should use it. If you found its learning curve to steep, maybe look at datapact - it's designed to be easy to get started, and intuitive to use.

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Simon Knott

💻 📖 🤔 🚧

st-sch

🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datapact-0.9.0.tar.gz (220.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datapact-0.9.0-py3-none-any.whl (218.4 kB view details)

Uploaded Python 3

File details

Details for the file datapact-0.9.0.tar.gz.

File metadata

  • Download URL: datapact-0.9.0.tar.gz
  • Upload date:
  • Size: 220.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for datapact-0.9.0.tar.gz
Algorithm Hash digest
SHA256 95a45d98cd6a9c9d387871c67aeb0e315fe453055f908abd1476a845d50c6375
MD5 f8ab9e68149c9838d66d0f0f35c87925
BLAKE2b-256 7d8320bf6a6bbea895cf6aac2a142e2f05960f7e13bc094383f4e7d6b11faf99

See more details on using hashes here.

File details

Details for the file datapact-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: datapact-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 218.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for datapact-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a735ac68ffc40a3d3d58decac81f1480563e5e7929033d9d8b30efe5d3b43ad
MD5 5503832ccb9c9ad2c0eb8dc9df545834
BLAKE2b-256 4afa565be0b6c6b7ead62426ed60a94717b107b7bea5de0e8b5dabda6bcdd80d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page