a library for automated table normalization
Project description
AutoNormalize
AutoNormalize is a Python library for automated datatable normalization, intended for use with Feature Tools. AutoNormalize allows you to build an EntitySet
from a single denormalized table and generate features for machine learning.
Before AutoNormalize:
After AutoNormalize:
Install
pip install autonormalize
Uninstall
pip uninstall autonormalize
API Reference
auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None)
Creates a normalized entityset from a dataframe.
Arguments:
df
(pd.Dataframe) : the dataframe containing data
accuracy
(0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
index
(str, optional) : name of column that is intended index of df
name
(str, optional) : the name of created EntitySet
time_index
(str, optional) : name of time column in the dataframe.
Returns:
entityset
(ft.EntitySet) : created entity set
find_dependencies(df, accuracy=0.98, index=None)
Finds dependencies within dataframe with the DFD search algorithm.
Returns:
dependencies
(Dependencies) : the dependencies found in the data within the contraints provided
normalize_dataframe(df, dependencies)
Normalizes dataframe based on the dependencies given.
Returns:
new_dfs
(list[pd.DataFrame]) : list of new dataframes
make_entityset(df, dependencies, name=None, time_index=None):
Creates a normalized EntitySet from dataframe based on the dependencies given.
Returns:
entityset
(ft.EntitySet) : created EntitySet
Feature Labs
AutoNormalize is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for autonormalize-0.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ebbbb6ddc2216a23f9d9c578e97db7eaf6a7d4394ed7cde016d5508ecdf8c0c |
|
MD5 | 1ea1690f517e1369b116f014050e7a91 |
|
BLAKE2b-256 | 30ce67d91a6c7600f5040009650f43a19f2e27b0b82106bc2e35cbe2c89b5479 |