Skip to main content

a library for automated table normalization

Project description



AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet from a single denormalized table and generate features for machine learning using Featuretools.

Getting Started


pip install featuretools[autonormalize]


pip uninstall autonormalize


API Reference


auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)

Creates a normalized entityset from a dataframe.


  • df (pd.Dataframe) : the dataframe containing data

  • accuracy (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)

  • index (str, optional) : name of column that is intended index of df

  • name (str, optional) : the name of created EntitySet

  • time_index (str, optional) : name of time column in the dataframe.


  • entityset (ft.EntitySet) : created entity set


find_dependencies(df, accuracy=0.98, index=None)

Finds dependencies within dataframe with the DFD search algorithm.


  • dependencies (Dependencies) : the dependencies found in the data within the contraints provided


normalize_dataframe(df, dependencies)

Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:

  1. shortest lenghts
  2. has "id" in some form in the name of an attribute
  3. has attribute furthest to left in the table


  • new_dfs (list[pd.DataFrame]) : list of new dataframes


make_entityset(df, dependencies, name=None, time_index=None)

Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframeand a new index will be created if any key has more than a single attribute.


  • entityset (ft.EntitySet) : created EntitySet


normalize_entityset(es, accuracy=0.98)

Returns a new normalized EntitySet from an EntitySet with a single entity.


  • es (ft.EntitySet) : EntitySet with a single entity to normalize


  • new_es (ft.EntitySet) : new normalized EntitySet

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autonormalize-2.0.1.tar.gz (585.8 kB view hashes)

Uploaded Source

Built Distribution

autonormalize-2.0.1-py3-none-any.whl (612.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page