Relang.
Project description
Relang
Self hosted, simplest possible web interface for text translations using Meta's NLLB200 model.
Why did you make this
Degoogling, fun little project, getting some "AI" experience.
But didn't tools like this already exist?
Sure! Now there is one more.
The best project I could find is nllb-serve, which uses pytorch. This one uses ctranslate2 (copy-pasted from the opennmt forum) and has less of an interface (my own doing).
What does it look like
Regardless. How do I set this up
Installation is simply pip install relang in a virtual environment, or use a
tool like pipx or
uvx. The model needs to be downloaded
separately using the links from this forum
post. You'll need
the SentencePiece model and any of the three NLLB models -- larger is better.
Running it as relang or python -m relang with the relevant path arguments
pointing at the downloaded (unzipped) models starts the web server at default
port 5000. Adding --gpu should move computations to the GPU, but I didn't
test that for reasons of not having one. Use --host :: for IPv6.
Obviously don't expose this to the internet.
Any known issues
The biggest one is that translations cannot be aborted once started. Also server side errors are not communicated. I may try to fix this in future.
The other thing is that sentences as split simply at punctuation marks with some very minimal heuristics, regardless of language. There are external libraries for this task but they seem to be mainly focused on English, and in the spirit of NLLB I rather degrate all languages the same.
Also NLLB doesn't appear to like certain unicode symbols so these are replaced by equivalent, better liked characters. But that table is likely incomplete.
What is NLLB200 anyway
The No Language Left Behind model was developed by Meta (nee Facebook) with the aim to provide high quality translations between 200+ languages. Its code and weights were released in 2022 for non-commercial use.
Is that the only model of its kind?
The AI arms race being what it is, no. In particular MATLAD400 by Google seems interesting, if only for its permissive Creative Commons licence, but I haven't gotten round to trying it yet.
Sadly all models seem to work at the sentence level, which seems suboptimal for quality and is a pain because splitting sentences is hard. Hopefully one of these days somebody will develop a model that can translate entire paragraphs in context.
Acknowledgements
All translation code as well as model files referenced above are from forum posts by Yasmin Moslem, who, unlike me, actually seems to understand how any of this works.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file relang-0.1.tar.gz.
File metadata
- Download URL: relang-0.1.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35487fa2ca6fdc0d8020af8bae70672f8a10ec583068363c91a5f08061a0dde4
|
|
| MD5 |
d70a0bcf637d0233cc06a5516bcb69a7
|
|
| BLAKE2b-256 |
9bbc684fdfd2e4b5a017a3cbe0caf7a9b0fa809b8e8b86659b2fa5996df8a356
|
File details
Details for the file relang-0.1-py3-none-any.whl.
File metadata
- Download URL: relang-0.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
181028df4c5764a726e6b89ebf9407a127af55d6850daa33bfc812a299835cd9
|
|
| MD5 |
d3f23821435d1f18becc19e94272e2f9
|
|
| BLAKE2b-256 |
11e9aa3c0125b5da5e261d8d2d528be4bac7b9f638992dc4201c3fc8591dd4bd
|