Skip to main content

URBANS: Universal Rule-based Machine Translation NLP Toolkit

Project description

URBANS: Universal Rule-Based Machine Translation toolkit

A tool for translating text from source grammar to target grammar (context-free) with corresponding dictionary.

Why not translate it yourself when Google Translate cannot satisfy you❓

CircleCI Codacy Badge Codacy Badge PyPI version GitHub release Maintenance License

⚙️ Installation

pip install urbans

✨ What is good about urbans?

  • Rule-based, deterministic translation; unlike Google Translate - giving only 1 non-deterministic result
  • Using NLTK parsing interface and is built on top of already-efficient NLTK backend
  • Can be used for data augmentation

📖 Usage

from urbans import Translator

# Source sentence to be translated
src_sentences = ["I love good dogs", "I hate bad dogs"]

# Source grammar in nltk parsing style
src_grammar = """
                S -> NP VP
                NP -> PRP
                VP -> VB NP
                NP -> JJ NN
                PRP -> 'I'
                VB -> 'love' | 'hate'
                JJ -> 'good' | 'bad'
                NN -> 'dogs'
                """

# Some edit within source grammar to target grammar
src_to_target_grammar =  {
    "NP -> JJ NN": "NP -> NN JJ" # in Vietnamese NN goes before JJ
}

# Word-by-word dictionary from source language to target language
en_to_vi_dict = {
    "I":"tôi",
    "love":"yêu",
    "hate":"ghét",
    "dogs":"những chú_chó",
    "good":"ngoan",
    "bad":"hư"
    }

translator = Translator(src_grammar = src_grammar,
                        src_to_tgt_grammar = src_to_target_grammar,
                        src_to_tgt_dictionary = en_to_vi_dict)

trans_sentences = translator.translate(src_sentences) 
# This should returns ['tôi yêu những chú_chó ngoan', 'tôi ghét những chú_chó hư']

⚖️ License

This repository is using the Apache 2.0 license that is listed in the repo. Please take a look at LICENSE as you wish.

✍️ BibTeX

If you wish to cite the framework feel free to use this (but only if you loved it 😊):

@misc{phat2020urbans,
  author = {Truong-Phat Nguyen},
  title = {URBANS: Universal Rule-Based Machine Translation NLP toolkit},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/pyurbans/urbans}},
}

Contributors:

  • Patrick Phat Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urbans-0.0.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

urbans-0.0.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file urbans-0.0.1.tar.gz.

File metadata

  • Download URL: urbans-0.0.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for urbans-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b6d7ce7719938439fdbaaa20de165f1aa5234562466f34e711a1d2e7bde59fb8
MD5 73862cdbe75732adc7418e4287d99f09
BLAKE2b-256 c680d41315fb3ae1e3856cbc22e78d8ce9135778e99089801bab8a221096abe9

See more details on using hashes here.

File details

Details for the file urbans-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: urbans-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for urbans-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c42ffe733ac30cb34083a5e6b9a88375b86236a3da02f5a22a3aa26722f3707
MD5 d75620a6662eb9ad3f905d1ef2dc1961
BLAKE2b-256 3843dbb432feeb173aa5dc6c13c1a7a1343d0d81e91e4c7f790ef3c381dae565

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page