Skip to main content

Real valued sequence to sequence autoencoder

Project description

Real valued sequence to sequence autoencoder

Most sequence to sequence autoencoders I can find are suitable for categorical sequences, such as translation.

This auto encoder is for real valued sequences.

The input and output can be multi dimensional, have different dimensions and could even be totally different

Compatibility

  • This package is compatible with both Python 2 and 3
  • Example scripts in main.py and main.ipynb are also compatible with both
  • I have no plan to use tensorflow 2.0, so all the nonsense about future obsolescence is muted

Installation

pip install seq2seq

Usage

First create a factory:

from models import  NDS2SAEFactory

n_iterations = 3000
factory = NDS2SAEFactory()
factory.set_output('toy.zip')
factory.input_dim = 2 #  Input is 2 dimensiona;
factory.output_dim = 1 #  Output is one dimensional
factory.layer_sizes = [50, 30]

# Use exponential decaying learning rate, in here it starts at 0.02 then decreases exponentially to 0.00001 
factory.lrtype = 'expdecay'
factory.lrargs = dict(start_lr=0.02, finish_lr=0.00001, decay_steps=n_iterations)

# Alternatively, set a constant rate is also possible, e.g. 0.001
# factory.lrtype = 'constant'
# factory.lrargs = dict(lr=0.001)

# For the dropout layer. Default is None. If None, the dropout layer is not used
factory.keep_prob = 0.7

# The hidden layer will be symmetric (in this case: 50:30:30:50)
# otherwise it'll be repeated (50:30:50:30)
factory.symmetric = True

# Save or load (and resume) from this zip file
encoder = factory.build()

Create a training sample generator and a validation sample generator. Both should have the same signature:

def generate_samples(batch_size):
    """
    :return in_seq: a list of input sequences. Each sequence must be a np.ndarray
            out_seq: a list of output sequences. Each sequence must be a np.ndarray
                These sequences don't need to be the same length and don't need any padding
                The encoder will take care of that
            last_batch: True if this batch is the last of the iteration.
                E.g. if you have 70000 samples and the batch size is 20000, you'll have 4 batches
                     the last batch contains 10000 samples. You should return False for the first 3
                     batches and True for the last one
    """
    ...
    return in_seq, out_seq, last_batch

Train

encoder.train(train_generator, valid_generator, n_iterations=3000, batch_size=100, display_step=100)

Predict

# test_seq is a list of np.ndarrays
predicted = encoder.predict(test_seq)

# predicted is a list of np.ndarrays. Each sequence will have the same length (due to padding)
# Look for the stop token to truncate the padding out

Encode

# test_seq is a list of np.ndarrays
encoded = encoder.encode(test_seq)

# encoded is a list of hidden-layer states corresponding to each input sequence

Jupyter notebook

Open main.ipynb to run the example

Licence

MIT

PRs are welcome

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seq2seq-0.0.5.tar.gz (2.8 kB view details)

Uploaded Source

File details

Details for the file seq2seq-0.0.5.tar.gz.

File metadata

  • Download URL: seq2seq-0.0.5.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for seq2seq-0.0.5.tar.gz
Algorithm Hash digest
SHA256 17709120dd208de7c275d737d294707e12e84cef99f62e57595be12fb5b33ddc
MD5 c0bba6ac44925776b16b68b75f8e2a97
BLAKE2b-256 7ae342b85fdaf1885de998c212afd9dda06aee4f5bded209551f5fa1b1945551

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page