Skip to main content

Neural Machine Translation for NLPIA 2nd Edition

Project description

# Neural Machine Translation (NMT)

## Description

This is the Neural Machine Translation package for NLPIA 2nd Edition. Currently support Spanish-English Seq2Seq model using 1-layer GRU with Bag-of-Word accuracy

## Installation

If you just want to install nmt package from PyPI channel:

`console $ pip install nmt==0.0.4 `

If you want to modify the source code to run experiments you’ll need to install dependencis in an environment and then install the package in –editable mode.

### Environment

Dependencies:

  • NLTK

  • editdistance

Create a conda environment where you can install all the dependencies like pytorch, pandas, nltk, spacy, and scikit-learn. Jupyter is also installed so developers can experiment in jupyter console (ipython) and Data Scientists can use jupyter notebook.

`console $ conda update -y -n base -c defaults conda $ conda create -y -n nmt 'python=3.7.9' $ conda env update -n nmt -f environment.yml $ conda activate nmt || source activate nmt `

## Usage

### Train an NMT model

  1. Activate conda env with the nmt package installed

  2. nmt –config ${model_hyperparameter_json} –epochs ${num_epoch} –data_path ${training_file} –model_checkpoint_dir ${export_path} –metrics_dir ${metrics_path}

### Parameters

  • Model Hyperparameter Json: Name of the config file (under the experiment subdirectory)

  • Epoch: Number of Epoch

  • Training Text File: Directory of the training corpus (.txt)

  • Model Checkpoint Path: Directory to save model checkpoint

  • Metric Directory: Directory to save learning curve and model metrics

## Roadmap - [ ] 0. [Add badge for unittests](https://docs.gitlab.com/ee/user/project/badges.html) to README.md - [ ] 0. Push release to pypi: git tag -a 0.0.6 -m ‘toy_problem.py works!’ && python setup.py sdist bdist_wheel upload - [x] 1. Set up a simple decoder-encoder model using GRU cells, BLEU score as evaluation metrics - [x] 2. Conduct hyperparameter search - [x] 3. Add Attention Mechanism to Decoder-Encoder module - [ ] 4. Incorporate transfer learning from BERT or other models

## Directory structure

Code Structure within source directory: - experiments: submodule where hyperparameters are stored in json format and retrieved as config - models: submodule where Decoder, Encoder, Seq2Seq models are stored - utils: submodule where Word Dictionary and Data Preprocessing functions are found - main_script.py: script to kick start model training - training.py: script to walk through the whole training process

## Credits/References:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmt-0.0.6.tar.gz (15.7 MB view details)

Uploaded Source

Built Distribution

nmt-0.0.6-py2.py3-none-any.whl (35.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file nmt-0.0.6.tar.gz.

File metadata

  • Download URL: nmt-0.0.6.tar.gz
  • Upload date:
  • Size: 15.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for nmt-0.0.6.tar.gz
Algorithm Hash digest
SHA256 65db964c0afd5b0dbc4baecdce736f53222f63b791958176b5771db78a005096
MD5 224eab69378f90e8f4a6e1f0454b3bf5
BLAKE2b-256 e3861714a0e335a6b32717eab401c7dcc50fe2bb2a65b1f4963af72c5c01a608

See more details on using hashes here.

File details

Details for the file nmt-0.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: nmt-0.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for nmt-0.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a6d5ee7dbff9695244bbdb2a2b9f60bcb3fc0c05659efb968d1ca49a0bcddfee
MD5 ccd8f069b5096261f1d6b8468fc4f1ae
BLAKE2b-256 0bb9903db2223856e659edbe3fc86f9c509b0ba159013a2b3f07bf8a36d42114

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page