keras-pandas

Easy and rapid deep learning

Project description

# keras-pandas

**tl;dr:** keras-pandas allows users to rapidly build and iterate on deep learning models.

Getting data formatted and into keras can be tedious, time consuming, and difficult, whether your a veteran or new to
Keras. `keras-pandas` overcomes these issues by (automatically) providing:

- A cleaned, transformed and correctly formatted `X` and `y` (good for keras, sklearn or any other ML platform)
- An 'input nub', without the hassle of worrying about input shapes or data types
- An output layer, correctly formatted for the kind of response variable provided

With these resources, it's possible to rapidly build and iterate on deep learning models, and focus on the parts of
modeling that you enjoy!

For more info, check out the:

- [Code](https://github.com/bjherger/keras-pandas)
- [Documentation](http://keras-pandas.readthedocs.io/en/latest/intro.html)
- [Issue tracker](https://github.com/bjherger/keras-pandas/issues)
- [Author's website](https://www.hergertarian.com/)

## Quick Start

Let's build a model with the [titanic data set](https://www.kaggle.com/c/titanic/data). This data set is particularly
fun because this data set contains a mix of categorical and numerical data types, and features a lot of null values.

We'll `keras-pandas`

```bash
pip install -U keras-pandas
```

And then run the following snippet to create and train a model:

```python
from keras import Model
from keras.layers import Dense

from keras_pandas.Automater import Automater
from keras_pandas.lib import load_titanic

observations = load_titanic()

# Transform the data set, using keras_pandas
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']
text_vars = ['name']

auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars, text_vars=text_vars,
response_var='survived')
X, y = auto.fit_transform(observations)

# Start model with provided input nub
x = auto.input_nub

# Fill in your own hidden layers
x = Dense(32)(x)
x = Dense(32, activation='relu')(x)
x = Dense(32)(x)

# End model with provided output nub
x = auto.output_nub(x)

model = Model(inputs=auto.input_layers, outputs=x)
model.compile(optimizer='Adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train model
model.fit(X, y, epochs=4, validation_split=.2)

```

## Usage

### Installation

You can install `keras-pandas` with `pip`:

```bash
pip install -U keras-pandas
```

### Creating an Automater

The core feature of `keras-pandas` is the Automater, which accepts lists of variable types (all optional), and a
response variable (optional, for supervised problems). Together, all of these variables are the `user_input_variables`,
which may be different than the variables fed into Keras.

As a side note, the response variable must be in one of the variable type lists (e.g. `survived` is in `categorical_vars`)

#### One variable type

If you only have one variable type, only use that variable type!

```python
categorical_vars = ['pclass', 'sex', 'survived']
auto = Automater(categorical_vars=categorical_vars, response_var='survived')
```

#### Multiple variable types

If you have multiple variable types, throw them all in!

```python
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']

auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars, response_var='survived')
```

#### No `response_var`

If all variables are always available, and / or your problems space doesn't have a single response variable, you can
omit the response variable.

```python
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']

auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars)
```

In this case, an output nub will not be auto-generated

### Fitting the Automater

Before use, the `Automator` must be fit. The `fit()` method accepts a pandas DataFrame, which must contain all of the
columns listed during initialization.

```python
auto.fit(observations)
```

### Transforming data

Now, we can use our `Automater` to transform the dataset, from a pandas DataFrame to numpy objects properly formatted
for Keras's input and output layers.

```python
X, y = auto.transform(observations, df_out=False)
```

This will return two objects:

- `X`: An array, containing numpy object for each Keras input. This is generally one Keras input for each user
input variable.
- `y`: A numpy object, containing the response variable (if one was provided)

### Using input / output nubs

Setting up correctly formatted, heuristically 'good' input and output layers is often

- Tedious
- Time consuming
- Difficult for those new to Keras

With this in mind, `keras-pandas` provides correctly formatted input and output 'nubs'.

The input nub is correctly formatted to accept the output from `auto.transform()`. It contains one Keras Input layer
for each generated input, may contain addition layers, and has all input piplines joined with a `Concatenate` layer.

The output layer is correctly formatted to accept the response variable numpy object.

## Contributing

If you're interested in helping out, all open tasks are listed the GitHub Issues tab. The issues tagged with
`first issue` are a good place to start if your new to the project or new to open source projects.

If you're interested in a new major feature, please feel free to reach out to me

### Bug reports

The best bug reports are Pull Requests. The second best bug reports are new issues on this repo.

### Test

This framework uses `unittest` for unit testing. Tests can be run by calling:

```bash
cd tests/

python -m unittest discover -s . -t .
```
### Style guide

This codebase should follow [Google's Python Style Guide](https://google.github.io/styleguide/pyguide.html).

### Generating documentation

This codebase uses [sphinx](http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html)'s
[autodoc](http://www.sphinx-doc.org/en/master/ext/autodoc.html) feature. To generate new documentation, to reflect
updated documentation, run:

```bash
cd docs

make html

```

## Contact

Hey, I'm Brendan Herger, avaiable at [https://www.hergertarian.com/](https://www.hergertarian.com/). Please feel free
to reach out to me at `13herger <at> gmail <dot> com`

Project details

Release history Release notifications | RSS feed

3.1.0

Dec 15, 2018

3.0.1

Dec 13, 2018

3.0.0

Nov 28, 2018

2.2.0

Oct 27, 2018

2.1.0

Oct 18, 2018

2.0.2

Oct 11, 2018

2.0.1

Oct 11, 2018

2.0.0

Oct 11, 2018

1.3.5

Sep 29, 2018

1.3.4

Aug 26, 2018

1.3.3

Aug 24, 2018

1.3.2

Aug 1, 2018

1.3.1

Jun 15, 2018

1.3.0

Jun 14, 2018

This version

1.2.4

Jun 12, 2018

1.2.2

Jun 12, 2018

1.2.0

Jun 4, 2018

1.1.3

May 29, 2018

1.1.2

May 28, 2018

1.1.1

May 26, 2018

1.1.0

May 26, 2018

1.0.0

May 17, 2018

0.2.0

May 8, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keras-pandas-1.2.4.tar.gz (16.1 kB view hashes)

Uploaded Jun 12, 2018 Source

Built Distribution

keras_pandas-1.2.4-py2.py3-none-any.whl (21.9 kB view hashes)

Uploaded Jun 12, 2018 Python 2 Python 3

Hashes for keras-pandas-1.2.4.tar.gz

Hashes for keras-pandas-1.2.4.tar.gz
Algorithm	Hash digest
SHA256	`bb35efbfa0f52d4e14e61ecc21deb943114239e0bd594ad888697b8b1c6382d3`
MD5	`e9295794532c917a1e4faf0941335a0c`
BLAKE2b-256	`9b2e7220d03ecd61a60370185fa880e824eceeac90211aa78118ec8b459855fd`

Hashes for keras_pandas-1.2.4-py2.py3-none-any.whl

Hashes for keras_pandas-1.2.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5dc4b2c29849975ab4937b65630d2a55d833e7a745c46a6ba40e4d00d1c1e7c`
MD5	`3e0ec52f492447ab3fadd7eba189ecb1`
BLAKE2b-256	`9c624df26c36ba4456c3043af6c740f512bd9016442d0b632faf12de663c6b7a`