a library for automated table normalization
Project description
AutoNormalize
AutoNormalize is a Python library for automated datatable normalization. It allows you to build an EntitySet
from a single denormalized table and generate features for machine learning using Featuretools.
Getting Started
Install
pip install featuretools[autonormalize]
Uninstall
pip uninstall autonormalize
Demos
- Blog Post
- Machine Learning Demo with Featuretools
- Kaggle Liquor Sales Dataset Demo
- Demo with Editing Dependencies
- Kaggle Food Production Dataset Demo
API Reference
auto_entityset
auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)
Creates a normalized entityset from a dataframe.
Arguments:
-
df
(pd.Dataframe) : the dataframe containing data -
accuracy
(0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS) -
index
(str, optional) : name of column that is intended index of df -
name
(str, optional) : the name of created EntitySet -
time_index
(str, optional) : name of time column in the dataframe.
Returns:
entityset
(ft.EntitySet) : created entity set
find_dependencies
find_dependencies(df, accuracy=0.98, index=None)
Finds dependencies within dataframe with the DFD search algorithm.
Returns:
dependencies
(Dependencies) : the dependencies found in the data within the contraints provided
normalize_dataframe
normalize_dataframe(df, dependencies)
Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:
- shortest lenghts
- has "id" in some form in the name of an attribute
- has attribute furthest to left in the table
Returns:
new_dfs
(list[pd.DataFrame]) : list of new dataframes
make_entityset
make_entityset(df, dependencies, name=None, time_index=None)
Creates a normalized EntitySet from dataframe based on the dependencies given. Keys are chosen in the same fashion as for normalize_dataframe
and a new index will be created if any key has more than a single attribute.
Returns:
entityset
(ft.EntitySet) : created EntitySet
normalize_entityset
normalize_entityset(es, accuracy=0.98)
Returns a new normalized EntitySet
from an EntitySet
with a single entity.
Arguments:
es
(ft.EntitySet) : EntitySet with a single entity to normalize
Returns:
new_es
(ft.EntitySet) : new normalized EntitySet
Built at Alteryx Innovation Labs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file autonormalize-2.0.1.tar.gz
.
File metadata
- Download URL: autonormalize-2.0.1.tar.gz
- Upload date:
- Size: 585.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0019d19d0e89b8f305353691dfa42d57537f98245b872281625353f1661fa20 |
|
MD5 | c7ccd2e3283709e8d8f433ad2b884a72 |
|
BLAKE2b-256 | 32b8a1d20ec83fc98738a0ecadf4dddc04626d9836a8c7bcb445103ee8f569b7 |
File details
Details for the file autonormalize-2.0.1-py3-none-any.whl
.
File metadata
- Download URL: autonormalize-2.0.1-py3-none-any.whl
- Upload date:
- Size: 612.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4ec6ce14ca93f05e842aa8a1946ff9b6b89a1fddc3934cdd93a51e50cfd08fb |
|
MD5 | 8541f24b63856dcc6e9a17bf7b97d52c |
|
BLAKE2b-256 | 639ecb0b89fba895af9aab205ef18473cbb931146ba107499d75641fb1550ff9 |