Skip to main content

Back translation for Natural Language Processing (NLP) using Google Translate

Project description

BackTranslation

version Downloads license

BackTranslation is a python library that implemented to back translate the words among any two languages. This utilizes googletrans library and Baidu Translation API to translate the words.

Since there is an error in current verison of googletrans, you have to create only one instance to do back-translation for your work. Otherwise, it is easy to cause a bug from multi-requests. We will keep implementing this library with other translator libraries soon.

If you face any bug, you can open a issue in Github.

Installation

You can install it from PyPI:

$ pip install BackTranslation

Usage

Backtranslation with googletrans

Translate the original text to other language and translate back to augment the diversity of data in NLP research.

Parameters:

  • url: option. provide a list of services urls for translation if need. Default url is translate.google.com.
  • proxies: Optional. Proxies configuration. Dictionary mapping protocol or protocol and host to the URL of the proxy. i.e. proxies = {'http': '127.0.0.1:1234', 'http://host.name': '127.0.0.1:4012'}
  • text: required. Original text that need to do back translation.
  • src: option. Source language code of original text. If this parameter is None, the method will detect the language of text automatically. (Default: None)
  • tmp: option. Middle language code. If this parameter is None, the method will pick one of two languages which is different from src.
  • sleeping: option. It is a timer to limit the speed of back-translation to avoid Google rate limits (HTTP 429). Increase this value if you encounter errors after many translations. (Default: 0)

Return parameter: object Translated.

Attributes:

  • source_text: original sentence.
  • src: the language of original sentence
  • tmp: the target language as middle man
  • trans_text: intermediate result
  • back_text: back-tranlsated result
from BackTranslation import BackTranslation
trans = BackTranslation()
result = trans.translate('hello', src='en', tmp='zh-cn')
print(result.result_text)
# 'Hello there'

Complete example with auto language detection:

from BackTranslation import BackTranslation
trans = BackTranslation()
result = trans.translate('Anh ấy đã chữa khỏi cảm cúm bằng aspirin.')
print(result.src)         # 'vi'
print(result.tmp)         # 'en'
print(result.tran_text)   # intermediate translation
print(result.result_text) # back-translated result

If Google blocks your IP, you can provide alternative service URLs or a proxy:

from BackTranslation import BackTranslation
trans = BackTranslation(url=[
      'translate.google.com',
      'translate.google.co.kr',
    ], proxies={'http': '127.0.0.1:1234', 'http://host.name': '127.0.0.1:4012'})
result = trans.translate('hello', src='en', tmp='zh-cn')
print(result.result_text)

Note: You just need to create one instance of BackTranslation in order to avoid the issue in current version of googletrans.

Search the language code

You may find out your language code with full language name by using this method.

Parameters:

  • language: required. A language name in english.
from BackTranslation import BackTranslation
trans = BackTranslation()
trans.searchLanguage('Chinese')
# {'chinese (simplified)': 'zh-cn', 'chinese (traditional)': 'zh-tw'}

Backtranslation_Baidu with Baidu Translation API

To use this stable translation, you are required to register in Baidu Translation API for getting your own appID. It supports 2 million chacters per day for free. Note: Currently, they only support Chinese phone number to register the accout.

  • sleeping: option. Baidu standard API allows only 1 request per second (QPS limit). Set sleeping=1 (default) to stay within the limit. Increase if you encounter errors. (Default: 1)
from BackTranslation import BackTranslation_Baidu
trans = BackTranslation_Baidu(appid='YOUR APPID', secretKey='YOUR SECRETKEY')
result = trans.translate('hello', src='auto', tmp='zh')
print(result.tran_text)   # intermediate translation
print(result.result_text) # back-translated result
trans.closeHTTP()

Seach language code

Since Baidu provides the different language code, it will be updated soon.

Version Information

Version 0.3.1: fix some bugs for Baidu translator.

Version 0.2.2: fix the services url for Google Translator.

Version 0.2.1: fix the small bug. From this version, the library googletrans version is 4.0.0rc1.

Version 0.2.0: support back-translation with Baidu API, and fix bugs

Version 0.1.0: support back-translation with googletrans library

Contribution

Welcome to contribute BackTranslation library!

reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

backtranslation-0.4.0.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

backtranslation-0.4.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file backtranslation-0.4.0.tar.gz.

File metadata

  • Download URL: backtranslation-0.4.0.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for backtranslation-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a1393104820f16f5838c53c729b4aa4cdc4929dd59b2394169af08ebf785d3f7
MD5 8709fadb46498392aa37674255f0716a
BLAKE2b-256 79b8029b3dea73ef60dcfbd490a0b988dccc1d825edaf98fe65a304bfdf8ba98

See more details on using hashes here.

File details

Details for the file backtranslation-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for backtranslation-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7eeb6fd9334b6b400bda281a56c654ba7fe1159d1ca0e6df0bd356141aa935c5
MD5 e52a1bdff7ad8b80290de8faaa820b0e
BLAKE2b-256 c2449169a3141e239a8b32738454be5cc33400f7be6bd742c7844a75cb6760d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page