Skip to main content

A tiny Python no-string package for performing translation of a massive CSV/JSONL files with optionally pre-annotated object spans

Project description

bulk-translate 0.25.2

Open In Colab twitter PyPI downloads

Third-party providers hosting↗️

A tiny Python no-string package for performing translation of a massive CSV/JSONL files that natively provides support of pre-annotated fixed-spans that are invariant for translator.

Description

📘 More on spans

📘 bulk-translate features

The out-of-the box features of the bulk-translate are:

  • ✅ Support of the spans for annotation / optional translation.
  • ✅ Native Implementation of two translation modes:
    • fast-mode: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
    • accurate: performs individual translation of each text part.
  • ✅ No strings: you're free to adopt any LM / LLM backend.
    • Support googletrans by default.

Installation

From PyPI:

pip install bulk-translate

or latest version from here:

pip install git+https://github.com/nicolay-r/bulk-translate

Usage

API

👉 Follow this notebook tutorial at nlp-thirdgate

Command Line / Shell

NOTE: Spans supports only in JSON-lines format.

NOTE: Requires source_iter package installation.

For the following test.tsv example data with annotated entities enclosed in square brackets:

python -m bulk_translate.translate \
    --src "test/data/test.tsv" \
    --schema '{"translated":"{text}"}' \
    --adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
    --output "test-translated.jsonl" \
    --batch-size 10 \
    %%m \
    --src "auto" \
    --dest "ru"

Powered by

The pipeline construction components were taken from AREkit [github]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bulk_translate-0.25.2.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bulk_translate-0.25.2-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file bulk_translate-0.25.2.tar.gz.

File metadata

  • Download URL: bulk_translate-0.25.2.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for bulk_translate-0.25.2.tar.gz
Algorithm Hash digest
SHA256 c8ed141e543dbdc73031071985c08dca133f50865025f41a6168f43d5523f501
MD5 72541847b2bbad798f25197120210454
BLAKE2b-256 8566fd8e25ee54f5dc883c50bafdc107b46d8e081e4260808584f72c31136b81

See more details on using hashes here.

File details

Details for the file bulk_translate-0.25.2-py3-none-any.whl.

File metadata

File hashes

Hashes for bulk_translate-0.25.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a1168f4a5c431caa4b452fb38ac63747f17b8f7fc135c6c6e72330fea52b189f
MD5 5d89bab503644af406a26a8e9100f7bd
BLAKE2b-256 b84513bfef33da4ea2b9cd34694d24870228c2cbc5769872e41ff86573fed9e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page