Skip to main content

A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans

Project description

bulk-translate 0.24.1

Open In Colab

A tiny Python no-string package for performing translation of a massive CSV/JSONL files that natively provides support for annotating fixed-spans that are optionally invariant for translator.

The out-of-the box features of the bulk-translate are:

  • ✅ Support of the spans for annotation / optional translation.
  • ✅ Native Implementation of two translation modes:
    • fast-mode: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
    • accurate: pefroms individual translation of each text part.
  • ✅ No strings: you're free to adopt any LM / LLM backend.
    • Support googletrans by default.

Installation

pip install git+https://github.com/nicolay-r/bulk-translate

Usage

NOTE: If you wish to translate parse entities, you can use parse-entities flag

For the following test.tsv example data with annotated entities enclosed in square brackets:

python -m bulk_translate.translate \
    --src "test/data/test.tsv" \
    --prompt "{text}" \
    --adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
    --output "test-translated.jsonl" \
    --parse-entities \
    %% \
    --src "auto" \
    --dest "ru"

Powered by

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bulk_translate-0.24.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bulk_translate-0.24.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file bulk_translate-0.24.1.tar.gz.

File metadata

  • Download URL: bulk_translate-0.24.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for bulk_translate-0.24.1.tar.gz
Algorithm Hash digest
SHA256 733f4c6ae99fcd33d13e9a25f1dd0af68b9ab09b1b4ecad9b015d8f35a0f6016
MD5 dd9b00008def91d03656e53baad09423
BLAKE2b-256 d871a6da27b7a0630502b3a482940a438c1cf7e956a4fd6a5495d60ca5dd5661

See more details on using hashes here.

File details

Details for the file bulk_translate-0.24.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bulk_translate-0.24.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ec4c1284b18aaa5b8a896ecdc27c4aed876ef324c033c2e899d0a8d64c62bfde
MD5 97c55dde0f106d211322764b3c9bd3d3
BLAKE2b-256 67f997e5df20dd26dc67578e874bf58f7c72eff0683658a2545d8c7f0f6e0d76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page