Skip to main content

A package for automating data quality and integrity checks with optional GPU acceleration using cuDF

Project description

DataPilotChecker

Datapilot is a Python package that automates data quality and integrity checks for your dataset. It performs several checks including missing values, duplicate rows, outliers, data type validation, and range validation. The package uses cuDF for GPU acceleration if a compatible GPU is available, and falls back to Dask for parallel processing otherwise.

Installation

Basic Installation

You can install the package via pip:

pip install data_pilot_checker

Installation with GPU Support

To use GPU acceleration with cuDF, you need to set up a compatible environment. Follow these steps:

Create a conda environment with RAPIDS:

conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia \
    rapids=24.06 python=3.11 cuda-version=12.2

Activate the conda environment:

conda activate rapids-24.06

Install DataWhiz in the conda environment:

pip install data_pilot_checker

Check the rapids website for cuDF installation. (https://docs.rapids.ai/install)

Project details


Release history Release notifications | RSS feed

This version

1.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_quality_checker-1.7.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

data_quality_checker-1.7-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file data_quality_checker-1.7.tar.gz.

File metadata

  • Download URL: data_quality_checker-1.7.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for data_quality_checker-1.7.tar.gz
Algorithm Hash digest
SHA256 bd25dfdf0c311c01b75f30ce41f36df60ea4059ebdfe95e1b8dc80701cbba6b5
MD5 161885ed879fb6f859978d2b7e786932
BLAKE2b-256 f6fdd0b221aa3db4caaeea9bef94175c297091882e1a9568c58e54d7ff933c49

See more details on using hashes here.

File details

Details for the file data_quality_checker-1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for data_quality_checker-1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 686412c990a23fbde3166f7f7449934d1963113f09173a95e02c9282df73f369
MD5 d8038dc0613fef2d5009dcc9691b7a22
BLAKE2b-256 4f985fd0f4374b67db35a4ca2a0e1f8608e3bd2ed52bd931146fe89d12384355

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page