Skip to main content

RUBY: Universal Rule-based Machine Translation NLP Toolkit

Project description

RUBY: Universal Rule-Based Machine Translation toolkit

A tool for translating text from source grammar to target grammar (context-free) with corresponding dictionary.

Why not translate it yourself when Google Translate cannot satisfy you❓

CircleCI Codacy Badge Codacy Badge PyPI version GitHub release Maintenance License

⚙️ Installation

pip install ruby

✨ What is good about RUBY?

  • Rule-based, deterministic translation; unlike Google Translate - giving only 1 non-deterministic result
  • Using NLTK parsing interface and is built on top of already-efficient NLTK backend
  • Can be used for data augmentation

📖 Usage

from ruby import Translator

# Source sentence to be translated
src_sentences = ["I love good dogs", "I hate bad dogs"]

# Source grammar in nltk parsing style
src_grammar = """
                S -> NP VP
                NP -> PRP
                VP -> VB NP
                NP -> JJ NN
                PRP -> 'I'
                VB -> 'love' | 'hate'
                JJ -> 'good' | 'bad'
                NN -> 'dogs'
                """

# Some edit within source grammar to target grammar
src_to_target_grammar =  {
    "NP -> JJ NN": "NP -> NN JJ" # in Vietnamese NN goes before JJ
}

# Word-by-word dictionary from source language to target language
en_to_vi_dict = {
    "I":"tôi",
    "love":"yêu",
    "hate":"ghét",
    "dogs":"những chú_chó",
    "good":"ngoan",
    "bad":"hư"
    }

translator = Translator(src_grammar = src_grammar,
                        src_to_tgt_grammar = src_to_target_grammar,
                        src_to_tgt_dictionary = en_to_vi_dict)

trans_sentences = translator.translate(src_sentences) 
# This should returns ['tôi yêu những chú_chó ngoan', 'tôi ghét những chú_chó hư']

⚖️ License

This repository is using the Apache 2.0 license that is listed in the repo. Please take a look at LICENSE as you wish.

✍️ BibTeX

If you wish to cite the framework feel free to use this (but only if you loved it 😊):

@misc{phat2020ruby,
  author = {Truong-Phat Nguyen},
  title = {RUBY: Universal Rule-Based Machine Translation NLP toolkit},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nlp-ruby/ruby}},
}

Contributors:

  • Patrick Phat Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruby-0.0.2.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

ruby-0.0.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file ruby-0.0.2.tar.gz.

File metadata

  • Download URL: ruby-0.0.2.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.6.12

File hashes

Hashes for ruby-0.0.2.tar.gz
Algorithm Hash digest
SHA256 069815aee616653d7944240e8d68989695980aa75debf65f1dc054180458de6e
MD5 f925e18f9835ddaa6a2a9c63341ffa1e
BLAKE2b-256 f5691fef3ae0ef4e6d813416e9701fdeb4ba5c19f88bdeb3d4f1081428c8eba3

See more details on using hashes here.

File details

Details for the file ruby-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ruby-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.6.12

File hashes

Hashes for ruby-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3061fa553d9da62626f803b91df80a38afd7b74aa27c1a90365e6476dbf1752f
MD5 4522947f6fbd391ab5a7150b06cb53dd
BLAKE2b-256 6acb4b6ddfc4139c649949dbcebbd1f50fff07b04317e2a31fa3d06394c7915a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page