"improve how we work with evolving data"

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Project description

`datapact` - pytest, but for dataframes

datapact is a Python library for verifying your data.

import datapact

dp = datapact.test(df)

dp.age.must.be_positive()
dp.name.should.not_be_empty()

It works with Pandas + Dask DataFrames, and has special support for Jupyter Notebooks.

jupyter notebooks screenshot

Here's some features:

dozens of existing assertions, easy to add your own
great in-editor documentation via docstrings + types
two severence levels (.should for warnings, .must for failures)
failure notifications via E-Mail, MS Teams, Slack or PagerDuty (via Datapact Track)

Get Started here: https://datapact.dev

`Datapact` Track

Datapact Track is an optional, browser-based data tracking service.

Datapact Track dataset overview. code snippet for how to connect test suite to service

It's fully self-hostable via Docker and Postgres, and there's a hosted version available at track.datapact.dev.

Connecting your test suite is one line of code:

dp.connect(
  server="track.datapact.dev",
  token="..." # get this from the UI
)

Datapact track gives you:

notifications via E-Mail, Slack, MS Teams and PagerDuty
a central documentation of your datasets
history of data expectations + reality
data quality tracking

Try out Datapact Track at track.datapact.dev, or follow the self-hosting guide to deploy your own instance.

`datapact` vs Great Expectations

Both datapact and Great Expectations help you improve Data Quality, but with a different approach.

Great Expectations has its own JSON-based storage format for expectation suites, and it gives you a custom UI to edit them. It's way bigger than datapact - in project size, project scope, but also in complexity.

datapact is a lot younger, community-run, and more of a library than a framework. The main differentiator is that it allows you to express your test suites in Python code, right along your other code. This works in Python Scripts, Jupyter Notebooks, Pipeline Tests - everywhere that Python runs. And by having your tests in code, you can co-locate them with the rest of your code, and version control + review them just like all of it.

If you already know how to use Great Expectations, you should use it. If you found its learning curve to steep, maybe look at datapact - it's designed to be easy to get started, and intuitive to use.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Simon Knott}
💻 📖 🤔 🚧

_st-sch
🐛

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

This version

0.9.0

Jun 9, 2022

0.8.0

May 27, 2022

0.7.1

May 26, 2022

0.7.0

May 26, 2022

0.6.0

May 20, 2022

0.5.0

May 20, 2022

0.3.5

May 17, 2022

0.3.4

May 17, 2022

0.3.2

May 12, 2022

0.3.1

May 12, 2022

0.3.0

May 12, 2022

0.2.1

May 11, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datapact-0.9.0.tar.gz (220.0 kB view hashes)

Uploaded Jun 9, 2022 Source

Built Distribution

datapact-0.9.0-py3-none-any.whl (218.4 kB view hashes)

Uploaded Jun 9, 2022 Python 3

Hashes for datapact-0.9.0.tar.gz

Hashes for datapact-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`95a45d98cd6a9c9d387871c67aeb0e315fe453055f908abd1476a845d50c6375`
MD5	`f8ab9e68149c9838d66d0f0f35c87925`
BLAKE2b-256	`7d8320bf6a6bbea895cf6aac2a142e2f05960f7e13bc094383f4e7d6b11faf99`

Hashes for datapact-0.9.0-py3-none-any.whl

Hashes for datapact-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a735ac68ffc40a3d3d58decac81f1480563e5e7929033d9d8b30efe5d3b43ad`
MD5	`5503832ccb9c9ad2c0eb8dc9df545834`
BLAKE2b-256	`4afa565be0b6c6b7ead62426ed60a94717b107b7bea5de0e8b5dabda6bcdd80d`

datapact 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

`datapact` - pytest, but for dataframes

`Datapact` Track

`datapact` vs Great Expectations

Contributors ✨

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

datapact 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

datapact - pytest, but for dataframes

Datapact Track

datapact vs Great Expectations

Contributors ✨

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`datapact` - pytest, but for dataframes

`Datapact` Track

`datapact` vs Great Expectations