Skip to main content

Anthe improves performance of Transformers with less parameters.

Project description

Anthe

This is the official repository for the article Less is More! A slim architecture for optimal language translation. Anthe is an architecture that improves on the Transformer performance with much fewer parameters.

To run the experiments run the train.py file. If you want to activate the Transformer architecture, pass the argument --comments=sameemb_projectoutput. If you want to activate the Anthe architecture, pass the argument --comments=geglu_gateattention_hsoftpos:2_tcffn:.005_tcpreatt:.07_tclength:2. By default it will use the WMT14 dataset. If you want to use the WMT17 add the following text to the comments argument: --comments=..._lpair:cs-en, where the available language pairs are cs-en, de-en, fi-en, lv-en, ru-en, tr-en, zh-en.

You can install it as a package with pip install anthe-official.

Layers Available

The following layers are available for the Anthe architecture, only in TensorFlow 2.10.0 for now. You can access the Anthe architecture, the AntheEncoderBlock and the AntheDecoderBlock, like so:

from anthe_official.neural_models_tf import Anthe, AntheEncoderBlock, AntheDecoderBlock

model = Anthe(
    inputs_vocab_size, target_vocab_size, encoder_count, decoder_count, attention_head_count,
    d_model, d_point_wise_ff, dropout_prob
)

encoder_block = AntheEncoderBlock(
    attention_head_count, d_model, d_point_wise_ff, dropout_prob
)

decoder_block = AntheDecoderBlock(
    attention_head_count, d_model, d_point_wise_ff, dropout_prob
)

In the article we develop other layers that are part of the Anthe architecture, but might be of interest on their own. The TC versions of the Dense, Conv1D and Embedding, and the SoftPOS and the HSoftPOS, can be accessed like so:

from anthe_official.neural_models_tf import *

tc_dense = TCDense(d_model, length=3, ratio=.2)
tc_conv1d = TCConv1D(filters, kernel_size, tc_length=3, ratio=.2)
tc_embedding = TCEmbedding(input_dim, output_dim, tc_length=3, ratio=.2)

soft_pos = SoftPOS(add_units, n_subpos=add_units, repeat_subpos=1)
hsoft_pos = HSoftPOS(vocab_size, embed_dim)

Acknowledgements

We thank strutive07 for his implementation of the Transformer and WMT14 task, which we used as a starting point for our code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anthe_official-1.0.1.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

anthe_official-1.0.1-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file anthe_official-1.0.1.tar.gz.

File metadata

  • Download URL: anthe_official-1.0.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for anthe_official-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d8d2e0ccf0f817060ac0f57b044ed47fb92804de9ea5d044f1b0cc562687a5c9
MD5 7d09a058a40943afd06f434f13c8041a
BLAKE2b-256 be0bab1fe05834df11eb62df997dafa12bef75fdd0e174a0b3133a7827434041

See more details on using hashes here.

File details

Details for the file anthe_official-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for anthe_official-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e2f97aa6ebacaa91221f37d9f762f8ba23941bc96ca9eef2cad9fe2dd9ab337c
MD5 20cab74408b6eb8062ba4da551b91f47
BLAKE2b-256 46b927650f91f8ded4ccef838fd48e216856e0dd31573a63d23eafe4b62cbd17

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page