Translate text locally on your machine.
Project description
kotki
This package provides Python bindings for kotki, the Bergamot-based translation engine. Fast & easy language translations inside an easy to install Python module.
import kotki
kotki.loadRegistry("/home/user/example/registry.json")
kotki.translate("Whenever I am at the office, I like to drink coffee", "ende")
'Wann immer ich im Büro bin, trinke ich gerne Kaffee'
kotki.translate("Румънците получиха дълго чакани новини: пенсиите и минималната заплата ще бъдат увеличени от 2023 г.", "bgen")
'Romanians have received long-awaited news: pensions and minimum wages will be increased from 2023'
kotki.translate("jij bent geboren in de stad Den Haag.", "nlen")
'You were born in The Hague.'
Requirements
Lets grab some requirements before we do pip install
.
apt install -y ccache rapidjson-dev cmake libpcre++-dev libpcre2-dev python3-dev pybind11-dev
Get MKL installed. For example, this does the installation on Ubuntu 21.
wget -qO- 'https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB' | sudo apt-key add -
sudo sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
sudo apt update
sudo apt install -y intel-mkl-64bit-2020.4-912
In case you cannot find the correct version to install, you may use the package
manager to search for the correct package name: apt search intel-mkl-64bit-2020
pip install kotki
at which point you can do import kotki
inside your Python application.
API
The API is straight-forward and contains only 3 functions:
loadRegistry(path)
, read available translation modelstranslate(text, language)
, translate some textlistModels()
, list available translation models
Models
The translation models that kotki uses are 'borrowed' from the Mozilla Firefox Translations extension. You need to manually download these models. They are conveniently packaged into a single archive that can be downloaded at github.com/kroketio/kotki/releases.
registry.json
is included in this archive - which is needed for the loadRegistry()
call.
Performance / footprint
Translations are fast - (probably) faster than other Python packages that do
language translation. Translating a simple sentence is
usually under 10ms
(except the first time, due to model loading). Loading a
single translation model seems to take up around 40MB
in RAM.
Translation models are loaded on-demand. This means that model
loading does not happen during loadRegistry()
but during the first use
of translate()
- which typically takes (only) 100ms
(per model). So if you have
a project that uses both translate('foo', 'enfr')
and translate('foo', 'fren')
- you'll be using 2
models (and consequently 80MB
worth of RAM during the duration of your program).
Note that translations are done synchronously (and thus are 'blocking'). If you need an async/callback style approach, look at the Bergamot-Translator.
Acknowledgements
This project was made possible through the combined effort of all researchers and partners in the Bergamot project (Jerin Philip, et al). The translation models are prepared as part of the Mozilla project. The translation engine used is bergamot-translator which is based on marian.
License
MPL 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.