Skip to main content

Relang.

Project description

Relang

Self hosted, simplest possible web interface for text translations using Meta's NLLB200 model.

Why did you make this

Degoogling, fun little project, getting some "AI" experience.

But didn't tools like this already exist?

Sure! Now there is one more.

The best project I could find is nllb-serve, which uses pytorch. This one uses ctranslate2 (copy-pasted from the opennmt forum) and has less of an interface (my own doing).

What does it look like

screenshot

Regardless. How do I set this up

Installation is simply pip install relang in a virtual environment, or use a tool like pipx or uvx. The model needs to be downloaded separately using the links from this forum post. You'll need the SentencePiece model and any of the three NLLB models -- larger is better.

Running it as relang or python -m relang with the relevant path arguments pointing at the downloaded (unzipped) models starts the web server at default port 5000. Adding --gpu should move computations to the GPU, but I didn't test that for reasons of not having one. Use --host :: for IPv6.

Obviously don't expose this to the internet.

Any known issues

The biggest one is that translations cannot be aborted once started. Also server side errors are not communicated. I may try to fix this in future.

The other thing is that sentences as split simply at punctuation marks with some very minimal heuristics, regardless of language. There are external libraries for this task but they seem to be mainly focused on English, and in the spirit of NLLB I rather degrate all languages the same.

Also NLLB doesn't appear to like certain unicode symbols so these are replaced by equivalent, better liked characters. But that table is likely incomplete.

What is NLLB200 anyway

The No Language Left Behind model was developed by Meta (nee Facebook) with the aim to provide high quality translations between 200+ languages. Its code and weights were released in 2022 for non-commercial use.

Is that the only model of its kind?

The AI arms race being what it is, no. In particular MATLAD400 by Google seems interesting, if only for its permissive Creative Commons licence, but I haven't gotten round to trying it yet.

Sadly all models seem to work at the sentence level, which seems suboptimal for quality and is a pain because splitting sentences is hard. Hopefully one of these days somebody will develop a model that can translate entire paragraphs in context.

Acknowledgements

All translation code as well as model files referenced above are from forum posts by Yasmin Moslem, who, unlike me, actually seems to understand how any of this works.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

relang-0.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

relang-0.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file relang-0.1.tar.gz.

File metadata

  • Download URL: relang-0.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for relang-0.1.tar.gz
Algorithm Hash digest
SHA256 35487fa2ca6fdc0d8020af8bae70672f8a10ec583068363c91a5f08061a0dde4
MD5 d70a0bcf637d0233cc06a5516bcb69a7
BLAKE2b-256 9bbc684fdfd2e4b5a017a3cbe0caf7a9b0fa809b8e8b86659b2fa5996df8a356

See more details on using hashes here.

File details

Details for the file relang-0.1-py3-none-any.whl.

File metadata

  • Download URL: relang-0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for relang-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 181028df4c5764a726e6b89ebf9407a127af55d6850daa33bfc812a299835cd9
MD5 d3f23821435d1f18becc19e94272e2f9
BLAKE2b-256 11e9aa3c0125b5da5e261d8d2d528be4bac7b9f638992dc4201c3fc8591dd4bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page