automated data cleaning tool
Project description
datacleanbot
Automated Data Cleaning Tool.
The main goal is to develop a Python tool datacleanbot such that:
Given a random parsed raw dataset representing a supervised learning problem, the Python tool is capable of automatically identifying the potential issues and reporting the results and recommendations to the end-user in an effective way.
Install
$ pip install datacleanbot
QuickStart
Install OpenML (version 0.9.0):
OpenML is used to easily import datasets and share models and experiments.
$ pip install openml
For Windows, you need to have C++ Compiler installed.
Acquire data from OpenML:
>>> import openml as oml
>>> data = oml.datasets.get_dataset(id) # id: openml dataset id
>>> X, y, categorical_indicator, features = data.get_data(target=data.default_target_attribute, dataset_format='array')
>>> Xy = np.concatenate((X,y.reshape((y.shape[0],1))), axis=1)
Autoclean data with datacleanbot:
>>> import datacleanbot.dataclean as dc
>>> Xy = dc.autoclean(Xy, data.name, features)
Description
datacleanbot is equipped with the following capabilities:
- Present an overview report of the given dataset
- The most important features
- Statistical information (e.g., mean, max, min)
- Data types of features
- Clean common data problems in the raw dataset
- Duplicated records
- Inconsistent column names
- Missing values
- Outliers
The two aspects datacleanbot meaningfully automates are marked in bold.
User's Guide
The user's guide can be found at datacleanbot.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacleanbot-0.91.tar.gz.
File metadata
- Download URL: datacleanbot-0.91.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1.post20191125 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4787106f0acaff10267adc2986030f91d20ac96076ae3660f059b614f54393c2
|
|
| MD5 |
a746704471cdd3e71e09b9e592e7f4ce
|
|
| BLAKE2b-256 |
145624153ed1dba32d527936920a75344cf45edf70452e1d713506593fa69d36
|
File details
Details for the file datacleanbot-0.91-py3-none-any.whl.
File metadata
- Download URL: datacleanbot-0.91-py3-none-any.whl
- Upload date:
- Size: 199.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1.post20191125 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b95fab5b0f1975bedab8fbe30e75723850ba8ca6a30038a72214c4173e4d67d7
|
|
| MD5 |
260463a44669ec85cbba30ef3cb1f625
|
|
| BLAKE2b-256 |
e56f3e675b8e7ec7686f5510746f08754a930f5338b98239595e76163a5ca279
|