Skip to main content

Score German noun compounds according to their English-translatability.

Project description

Assessing-Translatability

Algorithmic procedure for scoring German compounds according to English-Translatability.


Overview

If you have done any translation or are familiar with the vocabulary of a second language, then it is apparent that some words or concepts are easier to translate than others. Many are familiar with the German-English loanwords ‘Schadenfreude’ or ‘Wanderlust,’ which mean, respectively, a perverse pleasure gained at the expense of another’s pain and a desire to leave an area of comfort and see the outside world.

This project aims to develop an algorithmic procedure for quantifying a 'translatability' score using the statistical properties of words and their translations. The procedure only works for German compound nouns.


Installation

pip install translatability

CLI

python -m translatability --help

Corpora

You can download parallel corpora necessary for the calculations from the links below.

Ensure you download the above corpora as aligned MOSES format. The image below demonstrates the appropriate download link for the OpenSubtitles corpus, which is found in the second of the two download matrices, within the lower-left triangle where the indices for "EN" and "DE" align. Please do not use the other "DE-EN" link in the top-right triangle because these data are in the wrong format.

You may pass the absolute path to each of the files when running the main script, as in the example below.

python -m translatability -w <path to Wikipedia corpus> -e <path to Europarl corpus> -s <path to OpenSubtitles corpus> -f <path to file>

Usage Example

python -m translatability -w "./wikipedia" -f res/test_short.txt

>>> ...
>>> The final list of scored words:
	0.7139908155580019 Salzwasser
	0.6117190378041573 Landzunge
	0.5171641715308692 Schnapsidee

In the /res directory you will find several sample texts which may be used for similar evaluations.


Documentation

For further details on the algorithm, its evaluation, and application in a small study, please consult the /docs directory.


Notes

  • The src/split.py module is a modified version of the CharSplit algorithm by dtuggener. All credit goes to this individual.
  • At the end of each run of the scripts, there is created in the current working directory a results and segments directory containing temporary data. You should clean these files between runs or else results will be contaminated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

translatability-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

translatability-0.1.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file translatability-0.1.0.tar.gz.

File metadata

  • Download URL: translatability-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.5.0.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for translatability-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7b3ad0e49b96c0d9eee0c31dc288d4cc644910f330e2a875f8ed18add36626bf
MD5 372e544c0ac3b961784f730292a9f188
BLAKE2b-256 6c47412675da45dff1694fabc3356bd0b6379d3e1016271ebbc99d205592cf2d

See more details on using hashes here.

File details

Details for the file translatability-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: translatability-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.5.0.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for translatability-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc909d729cb1bec9fa6ca36fb5f77debb699b1edfaf171a06e977a756f330d4c
MD5 10285029523d4851f77810038f8a121b
BLAKE2b-256 474495eceadc1449ee5cf9cc816498ab40705a6dedae84f60caafcef0e8080c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page