Skip to main content

An open-source framework for zero-shot multimodal machine translation inference

Project description

ZeroMMT

Read the paper (arXiv)

Model weights

ZeroMMT-600M | ZeroMMT-1.3B | ZeroMMT-3.3B

This package is intended to perform inference with the ZeroMMT model. ZeroMMT is a zero-shot multilingual multimodal machine translation system trained only on English text-image pairs. It starts from a pretrained NLLB (more info here) and adapts it using lightweight modules (adapters & visual projector) while keeping original weights frozen during training. It is trained using visually conditioned masked language modeling and KL divergence between original MT outputs and new MMT ones. ZeroMMT is available in 3 sizes: 600M, 1.3B and 3.3B. The largest model shows state-of-the-art performances on CoMMuTE, benchmark intended to evaluate abilities of multimodal translation systems to exploit image information to disambiguate the English sentence to be translated. ZeroMMT is multilingual and available for English-to-{Arabic,Chinese,Czech,German,French,Russian}.

If you use this package or like our work, please cite:

@misc{futeral2024zeroshotmultimodalmachinetranslation,
      title={Towards Zero-Shot Multimodal Machine Translation}, 
      author={Matthieu Futeral and Cordelia Schmid and Benoît Sagot and Rachel Bawden},
      year={2024},
      eprint={2407.13579},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.13579}, 
}

Installation

pip install zerommt

Example

without cfg

import requests
from PIL import Image
import torch
from zerommt import create_model

model = create_model(model_path="matthieufp/ZeroMMT-600M",
                     enable_cfg=False)
model.eval()

image = Image.open(
    requests.get(
        "http://images.cocodataset.org/val2017/000000002153.jpg", stream=True
    ).raw
)

src_text = "He's got a bat in his hands."
src_lang = "eng_Latn"
tgt_lang = "fra_Latn"

# Compute cross-entropy loss given translation
tgt_text = "Il a une batte dans ses mains."

with torch.inference_mode():
    loss = model(imgs=[image],
                 src_text=[src_text],
                 src_lang=src_lang,
                 tgt_text=[tgt_text],
                 tgt_lang=tgt_lang,
                 output_loss=True)

print(loss)

# Generate translation with beam search
beam_size = 4

image2 = Image.open(
    requests.get(
        "https://zupimages.net/up/24/29/7r3s.jpg", stream=True
    ).raw
)

with torch.inference_mode():
    generated = model.generate(imgs=[image, image2],
                               src_text=[src_text, src_text],
                               src_lang=src_lang,
                               tgt_lang=tgt_lang,
                               beam_size=beam_size)

translation = model.tokenizer.batch_decode(generated, skip_special_tokens=True)
print(translation)

with cfg (WARNING: enabling cfg will require approximately twice as much memory!)

import requests
from PIL import Image
import torch
from zerommt import create_model

model = create_model(model_path="matthieufp/ZeroMMT-600M",
                     enable_cfg=True)
model.eval()

image = Image.open(
    requests.get(
        "http://images.cocodataset.org/val2017/000000002153.jpg", stream=True
    ).raw
)

src_text = "He's got a bat in his hands."
src_lang = "eng_Latn"
tgt_lang = "fra_Latn"

# Compute cross-entropy loss given translation
tgt_text = "Il a une batte dans ses mains."
cfg_value = 1.25

with torch.inference_mode():
    loss = model(imgs=[image],
                 src_text=[src_text],
                 src_lang=src_lang,
                 tgt_text=[tgt_text],
                 tgt_lang=tgt_lang,
                 output_loss=True,
                 cfg_value=cfg_value)
print(loss)

# Generate translation with beam search and cfg
beam_size = 4

with torch.inference_mode():
    generated = model.generate(imgs=[image],
                               src_text=[src_text],
                               src_lang=src_lang,
                               tgt_lang=tgt_lang,
                               beam_size=beam_size,
                               cfg_value=cfg_value)
                               
translation = model.tokenizer.batch_decode(generated)[0]
print(translation)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerommt-0.1.0.tar.gz (168.1 kB view details)

Uploaded Source

Built Distribution

zerommt-0.1.0-py2.py3-none-any.whl (251.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file zerommt-0.1.0.tar.gz.

File metadata

  • Download URL: zerommt-0.1.0.tar.gz
  • Upload date:
  • Size: 168.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for zerommt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1dd5eb1ab3f09238a1c2d424651ed2b9fc6ba9b95fb744a0374bbadbab8f9552
MD5 abf49a0c8876911083630e4743d27433
BLAKE2b-256 8bbb66d744128afac7fb24de82bceb82c68e017f15349dcdf9a113fd9249ce8f

See more details on using hashes here.

File details

Details for the file zerommt-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: zerommt-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 251.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for zerommt-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c13e63187a1b27dd95f79d8efb294040bc41a81c24a546439a2219532f740038
MD5 89f1586854e40b748a22e8a04be71fff
BLAKE2b-256 1676f421827b1c2cff134d7a58cf81f5ef3e00fcb477f98dec273687d53052cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page