seq2seq

Real valued sequence to sequence autoencoder

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python

Project description

Real valued sequence to sequence autoencoder

Most sequence to sequence autoencoders I can find are suitable for categorical sequences, such as translation.

This auto encoder is for real valued sequences.

The input and output can be multi dimensional, have different dimensions and could even be totally different

Compatibility

This package is compatible with both Python 2 and 3
Example scripts in main.py and main.ipynb are also compatible with both
I have no plan to use tensorflow 2.0, so all the nonsense about future obsolescence is muted

Installation

pip install seq2seq

Usage

First create a factory:

from models import  NDS2SAEFactory

n_iterations = 3000
factory = NDS2SAEFactory()
factory.set_output('toy.zip')
factory.input_dim = 2 #  Input is 2 dimensiona;
factory.output_dim = 1 #  Output is one dimensional
factory.layer_sizes = [50, 30]

# Use exponential decaying learning rate, in here it starts at 0.02 then decreases exponentially to 0.00001 
factory.lrtype = 'expdecay'
factory.lrargs = dict(start_lr=0.02, finish_lr=0.00001, decay_steps=n_iterations)

# Alternatively, set a constant rate is also possible, e.g. 0.001
# factory.lrtype = 'constant'
# factory.lrargs = dict(lr=0.001)

# For the dropout layer. Default is None. If None, the dropout layer is not used
factory.keep_prob = 0.7

# The hidden layer will be symmetric (in this case: 50:30:30:50)
# otherwise it'll be repeated (50:30:50:30)
factory.symmetric = True

# Save or load (and resume) from this zip file
encoder = factory.build()

Create a training sample generator and a validation sample generator. Both should have the same signature:

def generate_samples(batch_size):
    """
    :return in_seq: a list of input sequences. Each sequence must be a np.ndarray
            out_seq: a list of output sequences. Each sequence must be a np.ndarray
                These sequences don't need to be the same length and don't need any padding
                The encoder will take care of that
            last_batch: True if this batch is the last of the iteration.
                E.g. if you have 70000 samples and the batch size is 20000, you'll have 4 batches
                     the last batch contains 10000 samples. You should return False for the first 3
                     batches and True for the last one
    """
    ...
    return in_seq, out_seq, last_batch

Train

encoder.train(train_generator, valid_generator, n_iterations=3000, batch_size=100, display_step=100)

Predict

# test_seq is a list of np.ndarrays
predicted = encoder.predict(test_seq)

# predicted is a list of np.ndarrays. Each sequence will have the same length (due to padding)
# Look for the stop token to truncate the padding out

Encode

# test_seq is a list of np.ndarrays
encoded = encoder.encode(test_seq)

# encoded is a list of hidden-layer states corresponding to each input sequence

Jupyter notebook

Open main.ipynb to run the example

Licence

MIT

PRs are welcome

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python

Release history Release notifications | RSS feed

This version

0.0.5

Jun 23, 2019

0.0.4

Jun 19, 2019

0.0.3

Jun 19, 2019

0.0.2

Jun 12, 2019

0.0.1

Jun 11, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seq2seq-0.0.5.tar.gz (2.8 kB view details)

Uploaded Jun 23, 2019 Source

File details

Details for the file seq2seq-0.0.5.tar.gz.

File metadata

Download URL: seq2seq-0.0.5.tar.gz
Upload date: Jun 23, 2019
Size: 2.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for seq2seq-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`17709120dd208de7c275d737d294707e12e84cef99f62e57595be12fb5b33ddc`
MD5	`c0bba6ac44925776b16b68b75f8e2a97`
BLAKE2b-256	`7ae342b85fdaf1885de998c212afd9dda06aee4f5bded209551f5fa1b1945551`

See more details on using hashes here.

seq2seq 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Real valued sequence to sequence autoencoder

Compatibility

Installation

Usage

Train

Predict

Encode

Jupyter notebook

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes