Neural Machine Translation for NLPIA 2nd Edition
Project description
# Neural Machine Translation (NMT)
## Description
This is the Neural Machine Translation package for NLPIA 2nd Edition. Currently support Spanish-English Seq2Seq model using 1-layer GRU with Bag-of-Word accuracy
## Installation
If you just want to install nmt package from PyPI channel:
`console $ pip install nmt==0.0.4 `
If you want to modify the source code to run experiments you’ll need to install dependencis in an environment and then install the package in –editable mode.
### Environment
Dependencies:
NLTK
editdistance
Create a conda environment where you can install all the dependencies like pytorch, pandas, nltk, spacy, and scikit-learn. Jupyter is also installed so developers can experiment in jupyter console (ipython) and Data Scientists can use jupyter notebook.
`console $ conda update -y -n base -c defaults conda $ conda create -y -n nmt 'python=3.7.9' $ conda env update -n nmt -f environment.yml $ conda activate nmt || source activate nmt `
## Usage
### Train an NMT model
Activate conda env with the nmt package installed
nmt –config ${model_hyperparameter_json} –epochs ${num_epoch} –data_path ${training_file} –model_checkpoint_dir ${export_path} –metrics_dir ${metrics_path}
### Parameters
Model Hyperparameter Json: Name of the config file (under the experiment subdirectory)
Epoch: Number of Epoch
Training Text File: Directory of the training corpus (.txt)
Model Checkpoint Path: Directory to save model checkpoint
Metric Directory: Directory to save learning curve and model metrics
## Roadmap - [ ] 0. [Add badge for unittests](https://docs.gitlab.com/ee/user/project/badges.html) to README.md - [ ] 0. Push release to pypi: git tag -a 0.0.6 -m ‘toy_problem.py works!’ && python setup.py sdist bdist_wheel upload - [x] 1. Set up a simple decoder-encoder model using GRU cells, BLEU score as evaluation metrics - [x] 2. Conduct hyperparameter search - [x] 3. Add Attention Mechanism to Decoder-Encoder module - [ ] 4. Incorporate transfer learning from BERT or other models
## Directory structure
Code Structure within source directory: - experiments: submodule where hyperparameters are stored in json format and retrieved as config - models: submodule where Decoder, Encoder, Seq2Seq models are stored - utils: submodule where Word Dictionary and Data Preprocessing functions are found - main_script.py: script to kick start model training - training.py: script to walk through the whole training process
## Credits/References:
[Benjamin Etienne’s repo](https://github.com/b-etienne/Seq2seq-PyTorch/)
[PyTorch’s documentation on Seq2Seq](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nmt-0.0.6.tar.gz
.
File metadata
- Download URL: nmt-0.0.6.tar.gz
- Upload date:
- Size: 15.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65db964c0afd5b0dbc4baecdce736f53222f63b791958176b5771db78a005096 |
|
MD5 | 224eab69378f90e8f4a6e1f0454b3bf5 |
|
BLAKE2b-256 | e3861714a0e335a6b32717eab401c7dcc50fe2bb2a65b1f4963af72c5c01a608 |
File details
Details for the file nmt-0.0.6-py2.py3-none-any.whl
.
File metadata
- Download URL: nmt-0.0.6-py2.py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6d5ee7dbff9695244bbdb2a2b9f60bcb3fc0c05659efb968d1ca49a0bcddfee |
|
MD5 | ccd8f069b5096261f1d6b8468fc4f1ae |
|
BLAKE2b-256 | 0bb9903db2223856e659edbe3fc86f9c509b0ba159013a2b3f07bf8a36d42114 |