A package for automating data quality and integrity checks with optional GPU acceleration using cuDF
Project description
DataPilotChecker
Datapilot is a Python package that automates data quality and integrity checks for your dataset. It performs several checks including missing values, duplicate rows, outliers, data type validation, and range validation. The package uses cuDF for GPU acceleration if a compatible GPU is available, and falls back to Dask for parallel processing otherwise.
Installation
Basic Installation
You can install the package via pip:
pip install data_pilot_checker
Installation with GPU Support
To use GPU acceleration with cuDF, you need to set up a compatible environment. Follow these steps:
Create a conda environment with RAPIDS:
conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia \
rapids=24.06 python=3.11 cuda-version=12.2
Activate the conda environment:
conda activate rapids-24.06
Install DataWhiz in the conda environment:
pip install data_pilot_checker
Check the rapids website for cuDF installation. (https://docs.rapids.ai/install)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_pilot_checker-1.6.tar.gz
.
File metadata
- Download URL: data_pilot_checker-1.6.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6954e9c8c2b39138c320c0185442540a10acb73a1f0f8788795b3e16de899ad3 |
|
MD5 | 01edcd327e0b1e32948c4d52d169e5dd |
|
BLAKE2b-256 | 09f1835c0e3965b176452e65167d11450642c7750ef954896ce68a8530bedc45 |
File details
Details for the file data_pilot_checker-1.6-py3-none-any.whl
.
File metadata
- Download URL: data_pilot_checker-1.6-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b073e8c245c956d04469dd00c50c1f9202b3fcb60adc5c67f886972e53f4ab57 |
|
MD5 | 301f4954e05c44ace8fb57dbe82d933f |
|
BLAKE2b-256 | 67ee1d226807a0b29d3bb27a2678b713436f08045b621ce64f2d0fe7550a962e |