Skip to main content

Simultaneous Machine Translation (SimulMT) with NLLB model optimization

Project description

NoLanguageLeftWaiting

Converts NoLanguageLeftBehind translation model to a SimulMT (Simultaneous Machine Translation) model, optimized for live/streaming use cases.

Based offline models such as NLLB suffer from eos token and punctuation insertion, inconsistent prefix handling and exponentially growing computational overhead as input length increases. This implementation aims at resolving that.

Installation

pip install nllw

The textual frontend is not installed by default.

Quick Start

  1. Demo interface :
python textual_interface.py
  1. Use it as a package
import nllw

model = nllw.load_model(
    src_langs=["fra_Latn"],
    nllb_backend="transformers",
    nllb_size="600M" #Alternative: 1.3B
)
translator = nllw.OnlineTranslation(
    model,
    input_languages=["fra_Latn"],
    output_languages=["eng_Latn"]
)

tokens = [nllw.timed_text.TimedText('Ceci est un test de traduction')]
translator.insert_tokens(tokens)
validated, buffer = translator.process()
print(f"{validated} | {buffer}")

tokens = [nllw.timed_text.TimedText('en temps réel')]
translator.insert_tokens(tokens)
validated, buffer = translator.process()
print(f"{validated} | {buffer}")

Work In Progress : Partial Speculative Decoding

Local Agreement already locks a stable prefix for the committed translation, so we cannot directly adopt Self-Speculative Biased Decoding for Faster Live Translation. Our ongoing prototype instead borrows the speculative idea only for the new tokens that need to be validated by the larger model.

The flow tested in speculative_decoding_v0.py:

  • Run the 600M draft decoder once to obtain the candidate continuation and its cache.
  • Replay the draft tokens through the 1.3B model, but stop the forward pass as soon as the main model reproduces a token emitted by the draft (predicted_tokens matches the draft output). We keep those verified tokens and only continue generation from that point.
  • On mismatch, resume full decoding with the 1.3B model until a match is reached again, instead of discarding the entire draft segment.

This “partial verification” trims the work the main decoder performs after each divergence, while keeping the responsiveness of the draft hypothesis. Early timing experiments from speculative_decoding_v0.py show the verification pass (~0.15 s in the example) is significantly cheaper than recomputing a full decoding step every time.

Input vs Output length:

Succesfully maintain output length, even if stable prefix tends to take time to grow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nllw-0.1.1.post1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nllw-0.1.1.post1-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file nllw-0.1.1.post1.tar.gz.

File metadata

  • Download URL: nllw-0.1.1.post1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for nllw-0.1.1.post1.tar.gz
Algorithm Hash digest
SHA256 a27a15226a309e69987b0281b9f9b1ecdebe62edcf03914411846810ff7b0a8f
MD5 922882230c9faf769d9432b24f4a6b5f
BLAKE2b-256 b684a2bb3bf759688026cc933f6177a50c8a3540c62cbe7e7e48a9263ce02661

See more details on using hashes here.

File details

Details for the file nllw-0.1.1.post1-py3-none-any.whl.

File metadata

  • Download URL: nllw-0.1.1.post1-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for nllw-0.1.1.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 f8d4d3d7a6a4a076ec95ffd2f7f34feacc7054d2c30a1b6bbccd5774da06eb13
MD5 3c4153a1261d581ba5544ad30536f54a
BLAKE2b-256 5de6579c50d1d0c764c45d3dc2decd9f5d301506a80464120370122c0fde10fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page