Easy and rapid deep learning - updated for tensorflow 2.0

These details have not been verified by PyPI

Project links

Project description

tf2-keras-pandas

tl;dr: keras-pandas allows users to rapidly build and iterate on deep learning models. Updated for tensorflow 2.0

Getting data formatted and into keras can be tedious, time consuming, and require domain expertise, whether your a veteran or new to Deep Learning. keras-pandas overcomes these issues by (automatically) providing:

Data transformations: A cleaned, transformed and correctly formatted X and y (good for keras, sklearn or any other ML platform)
Data piping: Correctly formatted keras input, hidden and output layers to quickly start iterating on

These approaches are build on best in world approaches from practitioners, kaggle grand masters, papers, blog posts, and coffee chats, to simple entry point into the world of deep learning, and a strong foundation for deep learning experts.

For more info, check out the:

Quick Start

Let's build a model with the lending club data set. This data set is particularly fun because this data set contains a mix of text, categorical and numerical data types, and features a lot of null values.

pip install --upgrade tf2-keras-pandas

from tensorflow.keras import Model
from keras_pandas import lib
from keras_pandas.Automater import Automater
from sklearn.model_selection import train_test_split

# Load data
observations = lib.load_lending_club()

# Train /test split
train_observations, test_observations = train_test_split(observations)
train_observations = train_observations.copy()
test_observations = test_observations.copy()

# List out variable types

data_type_dict = {'numerical': ['loan_amnt', 'annual_inc', 'open_acc', 'dti', 'delinq_2yrs',
                                'inq_last_6mths', 'mths_since_last_delinq', 'pub_rec', 'revol_bal',
                                'revol_util',
                                'total_acc', 'pub_rec_bankruptcies'],
                  'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',
                                  'application_type', 'disbursement_method'],
                  'text': ['desc', 'purpose', 'title']}
output_var = 'loan_status'

# Create and fit Automater
auto = Automater(data_type_dict=data_type_dict, output_var=output_var)
auto.fit(train_observations)

# Transform data
train_X, train_y = auto.fit_transform(train_observations)
test_X, test_y = auto.transform(test_observations)

# Create and fit keras (deep learning) model.
x = auto.input_nub
x = auto.output_nub(x)

model = Model(inputs=auto.input_layers, outputs=x)
model.compile(optimizer='adam', loss=auto.suggest_loss())

And that's it! In a couple of lines, we've created a model that accepts a few dozen variables, and can create a world class deep learning model

Usage

Installation

You can install tf2-keras-pandas with pip:

pip install -U tf2-keras-pandas

Creating an Automater

The Automater object is the central object in keras-pandas. It accepts a dictionary of the format {'datatype': ['var1', var2']}

For example we could create an automater using the built in numerical, categorical, and text datatypes, by calling:

# List out variable types
data_type_dict = {'numerical': ['loan_amnt', 'annual_inc', 'open_acc', 'dti', 'delinq_2yrs',
                                'inq_last_6mths', 'mths_since_last_delinq', 'pub_rec', 'revol_bal',
                                'revol_util',
                                'total_acc', 'pub_rec_bankruptcies'],
                  'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',
                                  'application_type', 'disbursement_method'],
                  'text': ['desc', 'purpose', 'title']}
output_var = 'loan_status'

# Create and fit Automater
auto = Automater(data_type_dict=data_type_dict, output_var=output_var)

As a side note, the response variable must be in one of the variable type lists (e.g. loan_status is in categorical_vars)

One variable type

If you only have one variable type, only use one variable type!

# List out variable types
data_type_dict = {'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',
                                  'application_type', 'disbursement_method']}
output_var = 'loan_status'

# Create and fit Automater
auto = Automater(data_type_dict=data_type_dict, output_var=output_var)

Multiple variable types

If you have multiple variable types, feel free to use all of them! Built in datatypes are listed in Automater.datatype_handlers

# List out variable types
data_type_dict = {'numerical': ['loan_amnt', 'annual_inc', 'open_acc', 'dti', 'delinq_2yrs',
                                'inq_last_6mths', 'mths_since_last_delinq', 'pub_rec', 'revol_bal',
                                'revol_util',
                                'total_acc', 'pub_rec_bankruptcies'],
                  'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',
                                  'application_type', 'disbursement_method'],
                  'text': ['desc', 'purpose', 'title']}
output_var = 'loan_status'

# Create and fit Automater
auto = Automater(data_type_dict=data_type_dict, output_var=output_var)

Custom datatypes

If there's a specific datatype you'd like to use that's not built in (such as images, videos, or geospatial), you can include it by using Automater's datatype_handlers parameter.

A template datatype can be found in keras_pandas/data_types/Abstract.py. Filling out this template will yield a new datatype handler. If you're happy with your work and want to share your new datatype handler, create a PR (and check out contributing.md)

No `output_var`

If your model doesn't need a response var, or your use case doesn't use keras-pandas's output functionality, you can skip the output_var by setting it to None

# List out variable types
data_type_dict = {'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',
                                  'application_type', 'disbursement_method']}
output_var = None

# Create and fit Automater
auto = Automater(data_type_dict=data_type_dict, output_var=output_var)

Fitting the Automater

Before use, the Automator must be fit. The fit() method accepts a pandas DataFrame, which must contain all of the columns listed during initialization.

auto.fit(observations)

Transforming data

Now, we can use our Automater to transform the dataset, from a pandas DataFrame to numpy objects properly formatted for Keras's input and output layers.

X, y = auto.transform(observations, df_out=False)

This will return two objects:

X: An array, containing numpy object for each Keras input. This is generally one Keras input for each user input variable.
y: A numpy object, containing the response variable (if one was provided)

Using input / output nubs

Setting up correctly formatted, heuristically 'good' input and output layers is often

Tedious
Time consuming
Difficult for those new to Keras

With this in mind, keras-pandas provides correctly formatted input and output 'nubs'.

The input nub is correctly formatted to accept the output from auto.transform(). It contains one Keras Input layer for each generated input, may contain addition layers, and has all input piplines joined with a Concatenate layer.

The output layer is correctly formatted to accept the response variable numpy object.

Changelog

PR title (#PR number, or #Issue if no PR)
There's nothing here! (yet)

Development

Updated README and setup.py links (No PR)

3.1.0

Add boolean datatype (#104)
Added Contributing.md section for new datatypes (#101)
Added datatypes to docs in index.rst (#101)
Modified documentation to automatically generate API docs (#101)

3.0.1

Changing CI to Circleci (#100)
Adding datatypes to CONTRIBUTING.md, adding CONTRIBUTING.md to docs (#96)
Adding docs badge (#95)
Adding support for unusual variable names / format keras names to be valid in name scope (#92)
Adding examples (#93)
Upgraded requests library to requests==2.20.1, based on security concern (#94)

3.0.0

Brand new release, with

Added

New Datatype interface, with easier to understand pipelines for each datatype
- All existing datatypes (Numerical, Categorical, Text & TimeSeries) re-implmented in this new format
- Support for custom data types generated by users
- Duck-typing helper method (keras_pandas/lib.check_valid_datatype()) to confirm that a datatype has valid signature
New testing, streamlined and standardized
Support for transforming unseen categorical levels, via the UNK token (experimental)

Modified

Updated Automater interface, which accepts a dictionary of data types
Heavily updated README
More consistent logging and data formatting for sample data sets

Removed

Removed examples, will be re-implemented in future release
All existing unittests
Bulk of new datatypes in contributing.md, will be re-added in future release

2.2.0

Add timeseries support (#78)
Add timeseries examples (#79)

2.1.0

Boolean support deprecated. Boolean (bool) data type can be treated as a special case of categorical data types

2.0.2

Remove a lot of the unnecessary dependencies (#75)
Update dependencies to contemporary versions (#74)

2.0.1

Fix issue w/ PyPi conflict

2.0.0

Adding CI/CD and PyPi links, and updating contact section w/ about the author (#70)
Major rewrite / update of examples (#72)
- Fixes bug in embedding transformer. Embeddings will now be at least length 1.
- Add functionality to check if resp_var is in the list of user provided variables
- Added better null filling w/ CategoricalImputer
- Added filling unseen values w/ CategoricalImputer
- Converted default transformer pipeline to use copy.deepcopy instead of copy.copy. This was a hotfix for a previously unknown issue.
- Standardizing setting logging level, only in test base class and examples (when __main__)

1.3.5

Adding regression example w/ inverse_transformation (#64)
Fixing issue where web socket connections were being opened needlessly (#65)

1.3.4

Adding Manifest.in, with including files references in setup.py (#54)

1.3.2

Fixed poorly written text embedding index unit test (#52)
Added license (#49)

Earlier

Lots of things happened. Break things and move fast

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.1.8

Oct 1, 2019

3.1.7

Sep 4, 2019

3.1.6

Sep 4, 2019

3.1.5

Sep 4, 2019

3.1.4

Sep 4, 2019

3.1.3

Sep 4, 2019

3.1.2

Sep 4, 2019

3.1.0

Sep 4, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf2-keras-pandas-3.1.8.tar.gz (34.2 kB view details)

Uploaded Oct 1, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tf2_keras_pandas-3.1.8-py3-none-any.whl (40.8 kB view details)

Uploaded Oct 1, 2019 Python 3

File details

Details for the file tf2-keras-pandas-3.1.8.tar.gz.

File metadata

Download URL: tf2-keras-pandas-3.1.8.tar.gz
Upload date: Oct 1, 2019
Size: 34.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for tf2-keras-pandas-3.1.8.tar.gz
Algorithm	Hash digest
SHA256	`ef3d3bd8f9e5216d99f29de75d02c5829ca1a144b21a0b094f58c3a8b3bb4775`
MD5	`4f4c8eb66333b44e7ed05895a962689b`
BLAKE2b-256	`5a969bd8f7c9442a0149cb729f0bdedb50a3e47d20408ff5a7d0847c9faf6bcb`

See more details on using hashes here.

File details

Details for the file tf2_keras_pandas-3.1.8-py3-none-any.whl.

File metadata

Download URL: tf2_keras_pandas-3.1.8-py3-none-any.whl
Upload date: Oct 1, 2019
Size: 40.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for tf2_keras_pandas-3.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89c6141f976d5e444518365aa96d76c6d1299f3432fb4f5052ec1c7604f49fd4`
MD5	`848c0ee985745d212d12b568c33e3df5`
BLAKE2b-256	`7ac1de4ca198020a237758462d27131617451cdafa61a80a67daf467b4218e53`

See more details on using hashes here.

tf2-keras-pandas 3.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

tf2-keras-pandas

Quick Start

Usage

Installation

Creating an Automater

One variable type

Multiple variable types

Custom datatypes

No output_var

Fitting the Automater

Transforming data

Using input / output nubs

Changelog

Development

3.1.0

3.0.1

3.0.0

2.2.0

2.1.0

2.0.2

2.0.1

2.0.0

1.3.5

1.3.4

1.3.2

Earlier

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

No `output_var`