Skip to main content

a library for automated table normalization

Project description

AutoNormalize

Tests

AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet from a single denormalized table and generate features for machine learning using Featuretools.

Getting Started

Install

pip install featuretools[autonormalize]

Uninstall

pip uninstall autonormalize

Demos

API Reference

auto_entityset

auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)

Creates a normalized entityset from a dataframe.

Arguments:

  • df (pd.Dataframe) : the dataframe containing data

  • accuracy (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)

  • index (str, optional) : name of column that is intended index of df

  • name (str, optional) : the name of created EntitySet

  • time_index (str, optional) : name of time column in the dataframe.

Returns:

  • entityset (ft.EntitySet) : created entity set

find_dependencies

find_dependencies(df, accuracy=0.98, index=None)

Finds dependencies within dataframe with the DFD search algorithm.

Returns:

  • dependencies (Dependencies) : the dependencies found in the data within the contraints provided

normalize_dataframe

normalize_dataframe(df, dependencies)

Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:

  1. shortest lenghts
  2. has "id" in some form in the name of an attribute
  3. has attribute furthest to left in the table

Returns:

  • new_dfs (list[pd.DataFrame]) : list of new dataframes

make_entityset

make_entityset(df, dependencies, name=None, time_index=None)

Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframeand a new index will be created if any key has more than a single attribute.

Returns:

  • entityset (ft.EntitySet) : created EntitySet

normalize_entityset

normalize_entityset(es, accuracy=0.98)

Returns a new normalized EntitySet from an EntitySet with a single entity.

Arguments:

  • es (ft.EntitySet) : EntitySet with a single entity to normalize

Returns:

  • new_es (ft.EntitySet) : new normalized EntitySet

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autonormalize-2.0.1.tar.gz (585.8 kB view details)

Uploaded Source

Built Distribution

autonormalize-2.0.1-py3-none-any.whl (612.3 kB view details)

Uploaded Python 3

File details

Details for the file autonormalize-2.0.1.tar.gz.

File metadata

  • Download URL: autonormalize-2.0.1.tar.gz
  • Upload date:
  • Size: 585.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.13

File hashes

Hashes for autonormalize-2.0.1.tar.gz
Algorithm Hash digest
SHA256 f0019d19d0e89b8f305353691dfa42d57537f98245b872281625353f1661fa20
MD5 c7ccd2e3283709e8d8f433ad2b884a72
BLAKE2b-256 32b8a1d20ec83fc98738a0ecadf4dddc04626d9836a8c7bcb445103ee8f569b7

See more details on using hashes here.

File details

Details for the file autonormalize-2.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for autonormalize-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f4ec6ce14ca93f05e842aa8a1946ff9b6b89a1fddc3934cdd93a51e50cfd08fb
MD5 8541f24b63856dcc6e9a17bf7b97d52c
BLAKE2b-256 639ecb0b89fba895af9aab205ef18473cbb931146ba107499d75641fb1550ff9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page