Skip to main content

Python functions to facilitate the pre-processing of data for ML tasks in a clinical context.

Project description

# CleanDat Python functions to facilitate the pre-processing of data to prepare them for ML tasks, especially suitable for data in a clinical context.

Major functionalities include heuristic based data cleaning and feature engineering like: - Automatic detection of encoding strings (e.g. 1=m) and application of the corresponding encoding to un-encoded data of the corresponding column - Automatic detection of date strings of different formats (e.g. 2019-01-01, 01/01/2019, January 2022) and conversion to a unified format - Encoding of date strings into decomposed date features (e.g. year, month, day, weekday, etc.) - Heuristics for unification of different number formats, e.g. 1,000.00 vs. 1.000,00 or exponential notations like 1e3 vs 10x10^2 - Detection and replacement of inconsistent data values

# Setup

Install via pip:

pip install cleandat

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleandat-0.0.3.tar.gz (10.5 kB view hashes)

Uploaded Source

Built Distribution

cleandat-0.0.3-py3-none-any.whl (13.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page