Skip to main content

Pure Data Framework

Project description

Pure Data

Developed by students of the Simulator ML (Karpov.Courses)

Pure Data is a tool designed to help organize data quality checks in your projects. You simply define the data you want to test, the list of test metrics and success criteria, run the test, and get a report with the results.

The Pure Data includes:

  • a list of different metrics that you can use to check the accuracy of the data;
  • Report class, with which you can iterate through a list of metrics and get some summary information about which metrics pass, fail, or drop with errors.

How to install

pip install pure-data

Key Functionality

There are plenty of metrics that you can use to control your data's accuracy and reliability.
You can either just apply the metrics you need to your data or use the Report class to create a checklist with metrics you'd like to check and get summary information about the metrics results.

Pure application diagram

Usage

Below is a brief example of how you can use Pure to verify your data.

Import Report class and metrics from which you can use any metrics you need.

from pure.report import Report
import pure.metrics as m

Firstly, initialize tables with names and data you want to work with, and create a checklist with metrics.

Metric returns a dict with some meta fields. In the checklist, you can specify which metric result fields you want to control within certain limits. In this example, we will determine limits for the "total" field in the first case and the "delta" field in the second one. 

tables = {"simple_table": data}
checklist = [
    ("simple_table", m.CountTotal(), {"total": (1, 1e6)}),
    ("simple_table", m.CountZeros("column_1"), {"delta": (0, 0.3)})
]

Then you can use Report just as follows

report = Report(tables=tables, checklist=checklist, engine='pandas')

Example of the report resulting dataframe:

Report dataframe

There is a more detailed example where the key functionality of the package is presented:
https://github.com/uberkinder/Pure-Data/blob/usage_example/examples/simple_example.ipynb

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pure-data-0.1.6.6.tar.gz (24.9 kB view details)

Uploaded Source

File details

Details for the file pure-data-0.1.6.6.tar.gz.

File metadata

  • Download URL: pure-data-0.1.6.6.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for pure-data-0.1.6.6.tar.gz
Algorithm Hash digest
SHA256 a0ed6fbfc55f44b8c953703171cec4430fa7cb8f59f86963c9a468264295713f
MD5 40774ce14f3167ae0463833e216f5981
BLAKE2b-256 33b7a1d4c1a875ed6ff53691ac3a80abe9a48ca31c1c10678c27dbaf90e5f205

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page