Skip to main content

automated data cleaning tool

Project description

License: MIT

datacleanbot

Automated Data Cleaning Tool. The main goal is to develop a Python tool datacleanbot such that: Given a random parsed raw dataset representing a supervised learning problem, the Python tool is capable of automatically identifying the potential issues and reporting the results and recommendations to the end-user in an effective way.

Install

$ pip install datacleanbot

QuickStart

Acquire data from OpenML:

>>> import openml as oml
>>> data = oml.datasets.get_dataset(id) # id: openml dataset id
>>> X, y, features = data.get_data(target=data.default_target_attribute, return_attribute_names=True)
>>> Xy = data.get_data()

Autoclean data with datacleanbot

>>> import datacleanbot.dataclean as dc
>>> Xy = dc.autoclean(Xy, data.name, features)

Description

datacleanbot is equipped with the following capabilities:

  • Present an overview report of the given dataset
    • The most important features
    • Statistical information (e.g., mean, max, min)
    • Data types of features
  • Clean common data problems in the raw dataset
    • Duplicated records
    • Inconsistent column names
    • Missing values
    • Outliers

The three aspects datacleanbot meaningfully automates are marked in bold.

User's Guide

The user's guide can be found at datacleanbot.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacleanbot-0.4.tar.gz (151.3 kB view details)

Uploaded Source

Built Distribution

datacleanbot-0.4-py3-none-any.whl (197.9 kB view details)

Uploaded Python 3

File details

Details for the file datacleanbot-0.4.tar.gz.

File metadata

  • Download URL: datacleanbot-0.4.tar.gz
  • Upload date:
  • Size: 151.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.6.3

File hashes

Hashes for datacleanbot-0.4.tar.gz
Algorithm Hash digest
SHA256 c03636c24795d5b609059abb73d0be99650c93e504f569661614e41e92a1ef88
MD5 194695949cc5d7b613cdea686acaea0e
BLAKE2b-256 8458fd158c4bc83ab6435c507f8935fd10f4f83e59901a5424637765b0b6fa0d

See more details on using hashes here.

File details

Details for the file datacleanbot-0.4-py3-none-any.whl.

File metadata

  • Download URL: datacleanbot-0.4-py3-none-any.whl
  • Upload date:
  • Size: 197.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.6.3

File hashes

Hashes for datacleanbot-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e2007c2591fe9c6eb0b0bcf1cf9533b79480974e92a76303096c2288c17984da
MD5 342e5ff05d893761a2c76b2bee8fad06
BLAKE2b-256 3f1e0ce3bdd6b7889cd0a7884bab83c8e13709f0794999afab08754eb69b33ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page