Simultaneous Machine Translation (SimulMT) with NLLB model optimization

These details have not been verified by PyPI

Project links

Homepage

Project description

NoLanguageLeftWaiting

Converts NoLanguageLeftBehind translation model to a SimulMT (Simultaneous Machine Translation) model, optimized for live/streaming use cases.

Based offline models such as NLLB suffer from eos token and punctuation insertion, inconsistent prefix handling and exponentially growing computational overhead as input length increases. This implementation aims at resolving that.

LocalAgreement policy
HuggingFace transformers implementation only.
Built for WhisperLiveKit
200 languages. See supported_languages.md for the full list.
Working on implementing a speculative/self-speculative decoding for a faster decoder, using 600M as draft model, and 1.3B as main model. Refs: https://arxiv.org/pdf/2211.17192: https://arxiv.org/html/2509.21740v1,

Installation

pip install nllw

The textual frontend is not installed by default.

Quick Start

Demo interface :

python textual_interface.py

Use it as a package

import nllw

model = nllw.load_model(
    src_langs=["fra_Latn"],
    nllb_backend="transformers",
    nllb_size="600M" #Alternative: 1.3B
)
translator = nllw.OnlineTranslation(
    model,
    input_languages=["fra_Latn"],
    output_languages=["eng_Latn"]
)

tokens = [nllw.timed_text.TimedText('Ceci est un test de traduction')]
translator.insert_tokens(tokens)
validated, buffer = translator.process()
print(f"{validated} | {buffer}")

tokens = [nllw.timed_text.TimedText('en temps réel')]
translator.insert_tokens(tokens)
validated, buffer = translator.process()
print(f"{validated} | {buffer}")

Work In Progress : Partial Speculative Decoding

Local Agreement already locks a stable prefix for the committed translation, so we cannot directly adopt Self-Speculative Biased Decoding for Faster Live Translation. Our ongoing prototype instead borrows the speculative idea only for the new tokens that need to be validated by the larger model.

The flow tested in speculative_decoding_v0.py:

Run the 600M draft decoder once to obtain the candidate continuation and its cache.
Replay the draft tokens through the 1.3B model, but stop the forward pass as soon as the main model reproduces a token emitted by the draft (predicted_tokens matches the draft output). We keep those verified tokens and only continue generation from that point.
On mismatch, resume full decoding with the 1.3B model until a match is reached again, instead of discarding the entire draft segment.

This “partial verification” trims the work the main decoder performs after each divergence, while keeping the responsiveness of the draft hypothesis. Early timing experiments from speculative_decoding_v0.py show the verification pass (~0.15 s in the example) is significantly cheaper than recomputing a full decoding step every time.

Input vs Output length:

Succesfully maintain output length, even if stable prefix tends to take time to grow.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.5

Feb 20, 2026

0.1.4.post1

Nov 27, 2025

0.1.4

Nov 27, 2025

0.1.3

Nov 21, 2025

0.1.2

Nov 11, 2025

0.1.1.post1

Nov 6, 2025

This version

0.1.1

Nov 6, 2025

0.1.0

Oct 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nllw-0.1.1.tar.gz (1.5 MB view details)

Uploaded Nov 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nllw-0.1.1-py3-none-any.whl (14.1 kB view details)

Uploaded Nov 6, 2025 Python 3

File details

Details for the file nllw-0.1.1.tar.gz.

File metadata

Download URL: nllw-0.1.1.tar.gz
Upload date: Nov 6, 2025
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for nllw-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9f19a01d50bc5bf883a6fc04d366ff8a1eef3869d5cd193cf95c77e64ddf223e`
MD5	`2e434205d4e083d4185b57c41e248f83`
BLAKE2b-256	`62ff5c3d508df49664646616b9bbb22a10e40a74cec04b72c12b168d72387fc3`

See more details on using hashes here.

File details

Details for the file nllw-0.1.1-py3-none-any.whl.

File metadata

Download URL: nllw-0.1.1-py3-none-any.whl
Upload date: Nov 6, 2025
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for nllw-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`916c31a0768788ec7bef05d4a69f9de26ade7d1492f8969e8d246c7459511621`
MD5	`316f3042ebebb72fcc6de52beaab96e3`
BLAKE2b-256	`78c949dadc2b8fa1f0f8b31341e1aa2fec0688802eb440af49eb04b85e506056`

See more details on using hashes here.

nllw 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NoLanguageLeftWaiting

Installation

Quick Start

Work In Progress : Partial Speculative Decoding

Input vs Output length:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes