Interpolate between discrete sequences.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Transformer-VAE (WIP)

Diagram of the a State Autoencoder

Transformer-VAE's learn smooth latent spaces of discretes sequence without hard-coding rules in their decoders.

This can be used for program synthesis, drug discovery, music generation and much more!

To see how it works checkout this blog post.

This repo is in active development but I should be coming out with a release soon.

Install

Install using pip:

pip install transformer_vae

Usage

You can exececute the module to easily train it on your own data.

python -m transformer_vae \
    --project_name="T5-VAE" \
    --output_dir=poet \
    --do_train \
    --huggingface_dataset=poems \

Or you can import Transformer-VAE to use as a package much like a Huggingface model.

from transformer_vae import T5_VAE_Model

model = T5_VAE_Model.from_pretrained('t5-vae-poet')

Training

Setup Weights & Biasis for logging, see client.

Get a dataset to model, must be represented with text. This is what we will be interpolating over.

This can be a text file with each line representing a sample.

python -m transformer_vae \
    --project_name="T5-VAE" \
    --output_dir=poet \
    --do_train \
    --train_file=poems.txt \

Alternatively seperate each sample with a line containing only <|endoftext|> seperating samples:

python -m transformer_vae \
    --project_name="T5-VAE" \
    --output_dir=poet \
    --do_train \
    --train_file=poems.txt \
    --multiline_samples

Alternatively provide a Huggingface dataset.

python -m transformer_vae \
    --project_name="T5-VAE" \
    --output_dir=poet \
    --do_train \
    --dataset=poems \
    --content_key text

Experiment with different parameters.

Once finished upload to huggingface model hub.

# TODO

Explore the produced latent space using Colab_T5_VAE.ipynb or vising this Colab page.

Contributing

Install with tests:

pip install -e .[test]

Possible contributions to make:

Could the docs be more clear? Would it be worth having a docs site/blog?
Use a Funnel transformer encoder, is it more efficient?
Allow defining alternative tokens set.
Store the latent codes from the previous step to use in MMD loss so smaller batch sizes are possible.

Feel free to ask what would be useful!

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.2

Dec 18, 2020

This version

0.0.1

Dec 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer_vae-0.0.1.tar.gz (24.2 kB view hashes)

Uploaded Dec 17, 2020 Source

Built Distribution

transformer_vae-0.0.1-py3-none-any.whl (27.4 kB view hashes)

Uploaded Dec 17, 2020 Python 3

Hashes for transformer_vae-0.0.1.tar.gz

Hashes for transformer_vae-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`f4ca46d47e651368f2fe5f75bb9731ef697354d61d8856051797470d073eec78`
MD5	`c99a56aa398931a6a4c75da2da029487`
BLAKE2b-256	`67882240e8898a93cffdfd0c7b42a74710eb1945750e609b4dd0a2bee4a213f5`

Hashes for transformer_vae-0.0.1-py3-none-any.whl

Hashes for transformer_vae-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bfefa62b9005fd549aad48e7e9a122b15fbedbdde552a00f97835c40c60ef88a`
MD5	`004275f51e0c8d25c9c38e4a2c4a63ea`
BLAKE2b-256	`10721bbe3f0634f1d754518e7884ba88e686044765e9b289263f3cfd8e448fdf`