Skip to main content

A package for automating data quality and integrity checks with optional GPU acceleration using cuDF

Project description

DataWhiz

DataWhiz is a Python package that automates data quality and integrity checks for your dataset. It performs several checks including missing values, duplicate rows, outliers, data type validation, and range validation. The package uses cuDF for GPU acceleration if a compatible GPU is available, and falls back to Dask for parallel processing otherwise.

Installation

Basic Installation

You can install the package via pip:

pip install datawhiz

Installation with GPU Support

To use GPU acceleration with cuDF, you need to set up a compatible environment. Follow these steps:

Create a conda environment with RAPIDS:

conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia \
    rapids=24.06 python=3.11 cuda-version=12.2

Activate the conda environment:

conda activate rapids-24.06

Install DataWhiz in the conda environment:

pip install datawhiz

Check the rapids website for cuDF installation. (https://docs.rapids.ai/install)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_pilot_checker-1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

data_pilot_checker-1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file data_pilot_checker-1.tar.gz.

File metadata

  • Download URL: data_pilot_checker-1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.7

File hashes

Hashes for data_pilot_checker-1.tar.gz
Algorithm Hash digest
SHA256 bdb4b06006cb45c8c3b5e3287705777c66876cea55451641a377ad0c340b7aab
MD5 0f591e1556b0b1356565e8c1e640ea1b
BLAKE2b-256 96b026549c042c345f699e0a6c24f746e8a4fe3ca7995773e3310e6c651897c7

See more details on using hashes here.

File details

Details for the file data_pilot_checker-1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_pilot_checker-1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb9842cb3b1865f5f04aeea962b594068d371d21d84fe331b28165df79224fd9
MD5 fc5cfcfb5306f36f1d6e453e9dad77aa
BLAKE2b-256 d421ac02c0a8983660c6e55afcb7d1f2b7b027c783130f9f330e768a0a23608a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page