Skip to main content

A lazy yet bulletproof machine translation tool for Elasticsearch.

Project description

ES Translator

A lazy yet bulletproof machine translation tool for Elasticsearch.

Installation (Ubuntu)

Install Apertium:

wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash
sudo apt install apertium-all-dev

Then install es-translator with pip:

python3 -m pip install --user es-translator

Installation (Docker)

Nothing to do as long as you have Docker on your system:

docker run -it icij/es-translator es-translator --help

Usage

The primary command from EsTranslator to translate documents is es-translator:

Usage: es-translator [OPTIONS]

Options:
  -u, --url TEXT                  Elasticsearch URL
  -i, --index TEXT                Elasticsearch Index
  -r, --interpreter TEXT          Interpreter to use to perform the
                                  translation
  -s, --source-language TEXT      Source language to translate from
                                  [required]
  -t, --target-language TEXT      Target language to translate to  [required]
  --intermediary-language TEXT    An intermediary language to use when no
                                  translation is available between the source
                                  and the target. If none is provided this
                                  will be calculated automatically.
  --source-field TEXT             Document field to translate
  --target-field TEXT             Document field where the translations are
                                  stored
  -q, --query-string TEXT         Search query string to filter result
  -d, --data-dir PATH             Path to the directory where the language
                                  model will be downloaded
  --scan-scroll TEXT              Scroll duration (set to higher value if
                                  you're processing a lot of documents)
  --dry-run                       Don't save anything in Elasticsearch
  -f, --force                     Override existing translation in
                                  Elasticsearch
  --pool-size INTEGER             Number of parallel processes to start
  --pool-timeout INTEGER          Timeout to add a translation
  --throttle INTEGER              Throttle between each translation (in ms)
  --syslog-address TEXT           Syslog address
  --syslog-port INTEGER           Syslog port
  --syslog-facility TEXT          Syslog facility
  --stdout-loglevel TEXT          Change the default log level for stdout
                                  error handler
  --progressbar / --no-progressbar
                                  Display a progressbar
  --plan                          Plan translations into a queue instead of
                                  processing them now
  --broker-url TEXT               Celery broker URL (only needed when planning
                                  translation)
  --max-content-length TEXT       Max translated content length
                                  (<[0-9]+[KMG]?>) to avoid highlight
                                  errors(see http://github.com/ICIJ/datashare/
                                  issues/1184)
  --help                          Show this message and exit.

Learn more about how to use this command in the Usage Documentation.

API

You can explore the API Documentation for more information.

Releasing a New Version

This section describes how to release a new version of es-translator. Only maintainers with publish access can perform releases.

Prerequisites

  • Push access to the GitHub repository
  • PyPI credentials configured for Poetry (poetry config pypi-token.pypi <your-token>)
  • Docker Hub credentials (for Docker image publishing)

Release Process

1. Ensure All Tests Pass

Before releasing, make sure all tests and linting checks pass:

make lint
make test

2. Bump the Version

Use one of the semantic versioning targets to bump the version:

# For bug fixes (1.0.0 -> 1.0.1)
make patch

# For new features (1.0.0 -> 1.1.0)
make minor

# For breaking changes (1.0.0 -> 2.0.0)
make major

This will:

  • Update the version in pyproject.toml
  • Create a git commit with the message build: bump to <version>
  • Create a git tag with the new version

Alternatively, set a specific version:

make set-version CURRENT_VERSION=1.2.3

3. Push Changes and Tags

Push the commit and tag to GitHub:

git push origin master
git push origin --tags

4. Publish to PyPI

Publish the package to PyPI:

make distribute

This builds the package and uploads it to PyPI using Poetry.

5. Publish Docker Image (Optional)

To publish a new Docker image:

# First-time setup for multi-arch builds
make docker-setup-multiarch

# Build and push the Docker image
make docker-publish

This will build and push the image with both the version tag and latest tag to Docker Hub.

6. Update Documentation

If documentation has changed, publish the updated docs:

make publish-doc

Version Numbering

We follow Semantic Versioning:

  • MAJOR version for incompatible API changes
  • MINOR version for new functionality in a backwards compatible manner
  • PATCH version for backwards compatible bug fixes

Makefile Targets Reference

Target Description
make patch Bump patch version (x.x.X)
make minor Bump minor version (x.X.0)
make major Bump major version (X.0.0)
make set-version CURRENT_VERSION=x.x.x Set specific version
make distribute Build and publish to PyPI
make docker-publish Build and push Docker image
make publish-doc Deploy documentation to GitHub Pages

Contributing

Contributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository. If you're willing to help, check the page about how to contribute to this project.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

es_translator-1.10.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

es_translator-1.10.0-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file es_translator-1.10.0.tar.gz.

File metadata

  • Download URL: es_translator-1.10.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.12 Linux/6.14.0-36-generic

File hashes

Hashes for es_translator-1.10.0.tar.gz
Algorithm Hash digest
SHA256 e11474d07f8907eb31889bb6b97d77b8c5d63ecd836396659eed36b579af9c64
MD5 d85dc283b990617b4a44376bf81a592c
BLAKE2b-256 82fcf4a71b8c0973ea0df826d2dba6d9f9619c741802501c669c9334c5a0b03a

See more details on using hashes here.

File details

Details for the file es_translator-1.10.0-py3-none-any.whl.

File metadata

  • Download URL: es_translator-1.10.0-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.12 Linux/6.14.0-36-generic

File hashes

Hashes for es_translator-1.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0afccdd8febab2fe58dd2aeb54b3ae5b5b8475c04c4a0d7a4f61c7103ac730b
MD5 7df04d139a4119d82806e6f6206d75ca
BLAKE2b-256 ecb1108f9272486dd04692c7f8c8e58874b7fdd76b53e8d2c63d5040cca0ab84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page