Skip to main content

TEA - Translation Engine Architect

Project description

TEA - Translation Engine Architect

A command line tool to create translation engine.

Install

First install pipx then (x being your python version):

pipx install pangeamt-tea

Usage

Step 1: Create a new project

tea new --customer customer --src_lang es --tgt_lang en --flavor automotion --version 2

This command will create the project directory structure:

├── customer_es_en_automotion_2
│   ├── config.yml
│   └── data

Then enter in the directory

cd customer_es_en_automotion_2

Step 2: Configuration

Tokenizer

A tokenizer can be applied to source and target

tea config tokenizer --src mecab  --tgt moses

To list all available tokenizer:

tea config tokenizer --help

Truecaser

tea config truecaser --src --tgt

BPE

tea config bpe -j

Processors

tea config processors -s "{processors}"

being processors a list of preprocesses and postprocesses.

To list all available tokenizer:

tea config processors --list

Config prepare

tea config prepare --shard_size 100000 --src_seq_length 400 --tgt_seq_length 400

Condif model

tea config translation-model -n onmt

Step 3:

Copy some multilingual ressources (.tmx, bilingual files, .af ) into the 'data' directory

Step 4: Run

Create workflow

tea worflow new

Clean the data passing the normalizers and validators:

tea workflow clean -n {clean_th} -d

being clean_th the number of threads.

Preprocess the data (split data in train, dev or test, tokenization, BPE):

tea workflow prepare -n {prepare_th} -s 3

being prepare_th the number of threads.

Training model

tea workflow train --gpu 0

Evaluate model

tea workflow eval --step {step} --src file.src --ref file.tgt --log file.log --out file.out --gpu 0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pangeamt-tea-0.2.32.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

pangeamt_tea-0.2.32-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file pangeamt-tea-0.2.32.tar.gz.

File metadata

  • Download URL: pangeamt-tea-0.2.32.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.9

File hashes

Hashes for pangeamt-tea-0.2.32.tar.gz
Algorithm Hash digest
SHA256 5a5f05f8b9c29338c5b182ebf490298f9f8171bc5f63a5f9acae608e4c9c4ef1
MD5 217df3856a80be2761061715e8fdb65f
BLAKE2b-256 ef247aa5a31181a71e744bde8a735a9ae8c0388c26a5a303ea09de2ecc8c431c

See more details on using hashes here.

File details

Details for the file pangeamt_tea-0.2.32-py3-none-any.whl.

File metadata

  • Download URL: pangeamt_tea-0.2.32-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.9

File hashes

Hashes for pangeamt_tea-0.2.32-py3-none-any.whl
Algorithm Hash digest
SHA256 6cf29f4ed7fb8e721e93a351a5cfde77dc5f7b0a25588e17d5ac0df259a56e3c
MD5 5540e01c39873c913b3d13437cd283cb
BLAKE2b-256 b6059e331526a034e5c7e76b5c98dde7708c246a46621cd7695d6254f5f5cb86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page