Skip to main content

A package for automating data quality and integrity checks with optional GPU acceleration using cuDF

Project description

DataPilotChecker

Datapilot is a Python package that automates data quality and integrity checks for your dataset. It performs several checks including missing values, duplicate rows, outliers, data type validation, and range validation. The package uses cuDF for GPU acceleration if a compatible GPU is available, and falls back to Dask for parallel processing otherwise.

Installation

Basic Installation

You can install the package via pip:

pip install data_pilot_checker

Installation with GPU Support

To use GPU acceleration with cuDF, you need to set up a compatible environment. Follow these steps:

Create a conda environment with RAPIDS:

conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia \
    rapids=24.06 python=3.11 cuda-version=12.2

Activate the conda environment:

conda activate rapids-24.06

Install DataWhiz in the conda environment:

pip install data_pilot_checker

Check the rapids website for cuDF installation. (https://docs.rapids.ai/install)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_pilot_checker-1.6.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

data_pilot_checker-1.6-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file data_pilot_checker-1.6.tar.gz.

File metadata

  • Download URL: data_pilot_checker-1.6.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for data_pilot_checker-1.6.tar.gz
Algorithm Hash digest
SHA256 6954e9c8c2b39138c320c0185442540a10acb73a1f0f8788795b3e16de899ad3
MD5 01edcd327e0b1e32948c4d52d169e5dd
BLAKE2b-256 09f1835c0e3965b176452e65167d11450642c7750ef954896ce68a8530bedc45

See more details on using hashes here.

File details

Details for the file data_pilot_checker-1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for data_pilot_checker-1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b073e8c245c956d04469dd00c50c1f9202b3fcb60adc5c67f886972e53f4ab57
MD5 301f4954e05c44ace8fb57dbe82d933f
BLAKE2b-256 67ee1d226807a0b29d3bb27a2678b713436f08045b621ce64f2d0fe7550a962e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page