Anthe improves performance of Transformers with less parameters.
Project description
Anthe
This is the official repository for the article Less is More! A slim architecture for optimal language translation. Anthe is an architecture that improves on the Transformer performance with much fewer parameters.
To run the experiments run the train.py
file. If you want to activate the Transformer architecture, pass the
argument --comments=sameemb_projectoutput
. If you want to activate the Anthe architecture, pass the argument
--comments=geglu_gateattention_hsoftpos:2_tcffn:.005_tcpreatt:.07_tclength:2
. By default it will use
the WMT14 dataset. If you want to use the WMT17 add the following text to the comments argument:
--comments=..._lpair:cs-en
, where the available
language pairs are cs-en, de-en, fi-en, lv-en, ru-en, tr-en, zh-en.
You can install it as a package with pip install anthe-official
.
Layers Available
The following layers are available for the Anthe architecture, only in TensorFlow 2.10.0 for now. You can access the Anthe architecture, the AntheEncoderBlock and the AntheDecoderBlock, like so:
from anthe_official.neural_models_tf import Anthe, AntheEncoderBlock, AntheDecoderBlock
model = Anthe(
inputs_vocab_size, target_vocab_size, encoder_count, decoder_count, attention_head_count,
d_model, d_point_wise_ff, dropout_prob
)
encoder_block = AntheEncoderBlock(
attention_head_count, d_model, d_point_wise_ff, dropout_prob
)
decoder_block = AntheDecoderBlock(
attention_head_count, d_model, d_point_wise_ff, dropout_prob
)
In the article we develop other layers that are part of the Anthe architecture, but might be of interest on their own. The TC versions of the Dense, Conv1D and Embedding, and the SoftPOS and the HSoftPOS, can be accessed like so:
from anthe_official.neural_models_tf import *
tc_dense = TCDense(d_model, length=3, ratio=.2)
tc_conv1d = TCConv1D(filters, kernel_size, tc_length=3, ratio=.2)
tc_embedding = TCEmbedding(input_dim, output_dim, tc_length=3, ratio=.2)
soft_pos = SoftPOS(add_units, n_subpos=add_units, repeat_subpos=1)
hsoft_pos = HSoftPOS(vocab_size, embed_dim)
Acknowledgements
We thank strutive07 for his implementation of the Transformer and WMT14 task, which we used as a starting point for our code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file anthe_official-1.0.1.tar.gz
.
File metadata
- Download URL: anthe_official-1.0.1.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8d2e0ccf0f817060ac0f57b044ed47fb92804de9ea5d044f1b0cc562687a5c9 |
|
MD5 | 7d09a058a40943afd06f434f13c8041a |
|
BLAKE2b-256 | be0bab1fe05834df11eb62df997dafa12bef75fdd0e174a0b3133a7827434041 |
File details
Details for the file anthe_official-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: anthe_official-1.0.1-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2f97aa6ebacaa91221f37d9f762f8ba23941bc96ca9eef2cad9fe2dd9ab337c |
|
MD5 | 20cab74408b6eb8062ba4da551b91f47 |
|
BLAKE2b-256 | 46b927650f91f8ded4ccef838fd48e216856e0dd31573a63d23eafe4b62cbd17 |