Real valued sequence to sequence autoencoder
Project description
Real valued sequence to sequence autoencoder
Most sequence to sequence autoencoders I can find are suitable for categorical sequences, such as translation.
This auto encoder is for real valued sequences.
The input and output can be multi dimensional, have different dimensions and could even be totally different
Compatibility
- This package is compatible with both Python 2 and 3
- Example scripts in
main.py
andmain.ipynb
are also compatible with both - I have no plan to use tensorflow 2.0, so all the nonsense about future obsolescence is muted
Installation
pip install seq2seq
Usage
First create a factory:
from models import NDS2SAEFactory
n_iterations = 3000
factory = NDS2SAEFactory()
factory.set_output('toy.zip')
factory.input_dim = 2 # Input is 2 dimensiona;
factory.output_dim = 1 # Output is one dimensional
factory.layer_sizes = [50, 30]
# Use exponential decaying learning rate, in here it starts at 0.02 then decreases exponentially to 0.00001
factory.lrtype = 'expdecay'
factory.lrargs = dict(start_lr=0.02, finish_lr=0.00001, decay_steps=n_iterations)
# Alternatively, set a constant rate is also possible, e.g. 0.001
# factory.lrtype = 'constant'
# factory.lrargs = dict(lr=0.001)
# For the dropout layer. Default is None. If None, the dropout layer is not used
factory.keep_prob = 0.7
# The hidden layer will be symmetric (in this case: 50:30:30:50)
# otherwise it'll be repeated (50:30:50:30)
factory.symmetric = True
# Save or load (and resume) from this zip file
encoder = factory.build()
Create a training sample generator and a validation sample generator. Both should have the same signature:
def generate_samples(batch_size):
"""
:return in_seq: a list of input sequences. Each sequence must be a np.ndarray
out_seq: a list of output sequences. Each sequence must be a np.ndarray
These sequences don't need to be the same length and don't need any padding
The encoder will take care of that
last_batch: True if this batch is the last of the iteration.
E.g. if you have 70000 samples and the batch size is 20000, you'll have 4 batches
the last batch contains 10000 samples. You should return False for the first 3
batches and True for the last one
"""
...
return in_seq, out_seq, last_batch
Train
encoder.train(train_generator, valid_generator, n_iterations=3000, batch_size=100, display_step=100)
Predict
# test_seq is a list of np.ndarrays
predicted = encoder.predict(test_seq)
# predicted is a list of np.ndarrays. Each sequence will have the same length (due to padding)
# Look for the stop token to truncate the padding out
Encode
# test_seq is a list of np.ndarrays
encoded = encoder.encode(test_seq)
# encoded is a list of hidden-layer states corresponding to each input sequence
Jupyter notebook
Open main.ipynb to run the example
Licence
MIT
PRs are welcome
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
seq2seq-0.0.5.tar.gz
(2.8 kB
view details)
File details
Details for the file seq2seq-0.0.5.tar.gz
.
File metadata
- Download URL: seq2seq-0.0.5.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17709120dd208de7c275d737d294707e12e84cef99f62e57595be12fb5b33ddc |
|
MD5 | c0bba6ac44925776b16b68b75f8e2a97 |
|
BLAKE2b-256 | 7ae342b85fdaf1885de998c212afd9dda06aee4f5bded209551f5fa1b1945551 |