Skip to main content

A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans

Project description

bulk-translate 0.25.0

Open In Colab PyPI downloads

A tiny Python no-string package for performing translation of a massive CSV/JSONL files that natively provides support of pre-annotated fixed-spans that are invariant for translator.

Description

📘 More on spans

📘 bulk-translate features

The out-of-the box features of the bulk-translate are:

  • ✅ Support of the spans for annotation / optional translation.
  • ✅ Native Implementation of two translation modes:
    • fast-mode: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
    • accurate: performs individual translation of each text part.
  • ✅ No strings: you're free to adopt any LM / LLM backend.
    • Support googletrans by default.

Installation

From PyPI:

pip install bulk-translate

or latest version from here:

pip install git+https://github.com/nicolay-r/bulk-translate

Usage

API

Please take a look at the related Wiki page

Command Line / Shell

NOTE: Spans supports only in JSON-lines format.

NOTE: Requires source_iter package installation.

For the following test.tsv example data with annotated entities enclosed in square brackets:

python -m bulk_translate.translate \
    --src "test/data/test.tsv" \
    --prompt "{text}" \
    --adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
    --output "test-translated.jsonl" \
    %%m \
    --src "auto" \
    --dest "ru"

Powered by

The pipeline construction components were taken from AREkit [github]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bulk_translate-0.25.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bulk_translate-0.25.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file bulk_translate-0.25.0.tar.gz.

File metadata

  • Download URL: bulk_translate-0.25.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for bulk_translate-0.25.0.tar.gz
Algorithm Hash digest
SHA256 200c8087249364f1a98835c7947a69570c05aadca4287b232c441d190b08a182
MD5 f49dccfad2612142189afde35a6e42cf
BLAKE2b-256 bbf19e5e4b08c43af04b105e325d80bde2da0de1beda41ec42c96fe917857648

See more details on using hashes here.

File details

Details for the file bulk_translate-0.25.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bulk_translate-0.25.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d889efc6679464edc2333ab943f9cd9cec13cae7c9447012a2f90c5ad06abd1e
MD5 148c96f4e87745da2496c82c9abfae96
BLAKE2b-256 a5d001a51ca10285e2b2cdba2658df4dc87f5a72da96e836752ea21945bd8028

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page