Skip to main content

Python package for creating messy data.

Project description

Untidy

A Python library for uncleaning your dataset.

Check status

Overview

Have you ever wondered how to introduce specific problems to your clean data? Now you can apply our out-of-the-box solution to untidy your data according to your needs.

The solution can be used primarily for educational purposes, where clean example data is made more realistic.

Real world data is often poised with missing values, datetime issues, data type mismatches, string encoding problems.

You can introduce the following problems to your data:

  • Adding missing values
  • Adding outliers
  • Changing the encoding of strings
  • Changing the data type of numeric columns to strings
  • Adding duplicate rows
  • Adding duplicate columns
  • Adding extra characters to strings

The package is designed to work with pandas DataFrames.

from untidy import untidyfy
messy_df = untidyfy(clean_df, 
                    corruption_level=4, # how much mess you want (0-10)
                    nans=True,
                    outliers=True,
                    text_noise=True,
                    mess_with_numbers=True,
                    mess_with_string_encodings=True,
                    duplicate_rows=True,
                    duplicate_columns=True)

Installation

Can be installed via directly via pip or by downloading the untidy-{release-version}.tar.gz file under release section. Run the command

pip install `untidy-{release-version}.tar.gz`

DAIN logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

untidy-0.0.1a4.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

untidy-0.0.1a4-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file untidy-0.0.1a4.tar.gz.

File metadata

  • Download URL: untidy-0.0.1a4.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for untidy-0.0.1a4.tar.gz
Algorithm Hash digest
SHA256 cbdda541e37d0531112dddd8203ef0b6c7c8e9160cdbd8d362bbd552f51937c1
MD5 bcbe4ba06167795627fbabf619ffca60
BLAKE2b-256 e6aafc080081f87729accea6ca26c3f51cf443b8943a33d5b3150ab35dae107b

See more details on using hashes here.

File details

Details for the file untidy-0.0.1a4-py3-none-any.whl.

File metadata

  • Download URL: untidy-0.0.1a4-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for untidy-0.0.1a4-py3-none-any.whl
Algorithm Hash digest
SHA256 7e94267d07c9ada68434d146eb6a946d0678f0fb77d56792c185a0ec04fba734
MD5 33fedfcb8168737f2221c85762476ec8
BLAKE2b-256 478eeaab2e871251069ead3d82f63dab4624dd9f155dd2cc479dc2a831b7dff9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page