Skip to main content

A tiny Computer-Assisted Translation tool

Project description

tinycat

Tiny Computer-Assisted translation tool. Created for endcoronavirus.org.

Installation

pip install tinycat

Example

Text you want to translate is in document.txt file:

This is the first paragrapth.

This is the second paragraph. Paragraphs have multiple sentences.

This is the third paragraph.
This is still the third paragraphs.
Paragraphs are divided by empty lines.

Run following command to generate translation from English to Polish:

python3 -m tinycat.cli translate --input-file document.txt --patch-file patch.txt --lang-in english --lang-out polish

Generated patch.txt:

----------------------------------------------------------------
This is the first paragrapth.

To jest pierwszy paragraf.
----------------------------------------------------------------

----------------------------------------------------------------
This is the second paragraph. Paragraphs have multiple sentences.

To jest drugi akapit. Akapity mają wiele zdań.
----------------------------------------------------------------

----------------------------------------------------------------
This is the third paragraph.
This is still the third paragraphs.
Paragraphs are divided by empty lines.

To jest trzeci akapit. To wciąż trzeci akapit. Akapity są podzielone pustymi wierszami.
----------------------------------------------------------------

Modify patch.txt and save it to patch_corrected.txt. In this case we corrected paragraf to akapit in the first sentence, and in third paragraph put each sentence in the new line for readability.

Apply the patch using:

python3 -m tinycat.cli update --patch-file patch_corrected.txt --dict-file en-pl.dict

Now the text can be translated as following (note that we are passing en-pl.dict:

python3 -m tinycat.cli translate --input-file document.txt --patch-file translated.txt --dict-file en-pl.dict --lang-in english --lang-out polish

translated.txt file does not contain any paragraphs to translate, as all the translations are taken from en-pl.dict dictionary. Content of translated.txt is:

To jest pierwszy akapit.

To jest drugi akapit. Akapity mają wiele zdań.

To jest trzeci akapit.
To wciąż trzeci akapit.
Akapity są podzielone pustymi wierszami.

In the next step we modify document.txt to add new paragraph, and correct typo in first sentence:

This is the first paragraph.

This is the second paragraph. Paragraphs have multiple sentences.

This is the third paragraph.
This is still the third paragraphs.
Paragraphs are divided by empty lines.

Final paragraph.

Let's create new patch for human translator:

python3 -m tinycat.cli translate --input-file document.txt --patch-file patch-2.txt --dict-file en-pl.dict --lang-in english --lang-out polish

patch-2.txt contains new translations that has to be checked:

----------------------------------------------------------------
This is the first paragraph.

To jest pierwszy akapit.
----------------------------------------------------------------

To jest drugi akapit. Akapity mają wiele zdań.

To jest trzeci akapit.
To wciąż trzeci akapit.
Akapity są podzielone pustymi wierszami.

----------------------------------------------------------------
Final paragraph.

Ostatni akapit.
----------------------------------------------------------------

This time all is fine so we can apply this patch:

python3 -m tinycat.cli update --patch-file patch-2.txt --dict-file en-pl.dict

And see final translated text with (if we don't pass --patch-file it will print to the console):

python3 -m tinycat.cli translate --input-file document.txt --dict-file en-pl.dict --lang-in english --lang-out polish

Help

python3 -m tinycat.cli translate --help
python3 -m tinycat.cli update --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinycat-0.1.1.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinycat-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file tinycat-0.1.1.tar.gz.

File metadata

  • Download URL: tinycat-0.1.1.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for tinycat-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f54685791154a32f804e9010473c98e04c1a7da9e1ba214cad45e4078fb31552
MD5 b6011ae113f9f7c442b41aba683d999a
BLAKE2b-256 7d0613198649fc282183385eec611da2e1da7027d5cf2ce770304b0978f04d29

See more details on using hashes here.

File details

Details for the file tinycat-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tinycat-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for tinycat-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e15f8f62251834a5cd54e3a3413aea7a773942ca456b08f086726102e9fa6804
MD5 b590ee99d91e8e3896360878bbb5407e
BLAKE2b-256 f919693e5a0304f75c6d11a7eb257488ca1481aa112978b4b3d3af48faf76afc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page