Skip to main content

A library for minimum Bayes risk (MBR) decoding.

Project description

mbrs is a library for minimum Bayes risk (MBR) decoding.

PyPi GitHub

Paper | Reference docs | Citation | Release notes

Installation

You can install from PyPi:

pip install mbrs

For developers, it can be installed from the source.

git clone https://github.com/naist-nlp/mbrs.git
cd mbrs/
pip install ./

For uv users:

git clone https://github.com/naist-nlp/mbrs.git
cd mbrs/
uv sync

Quick start

mbrs provides two interfaces: command-line interface (CLI) and Python API.

Command-line interface

Command-line interface can run MBR decoding from command-line. Before running MBR decoding, you can generate hypothesis sentences with mbrs-generate:

mbrs-generate \
  sources.txt \
  --output hypotheses.txt \
  --lang_pair en-de \
  --model facebook/m2m100_418M \
  --num_candidates 1024 \
  --sampling eps --epsilon 0.02 \
  --batch_size 8 --sampling_size 8 --fp16 \
  --report_format rounded_outline

Beam search can also be used by replacing --sampling eps --epsilon 0.02 with --beam_size 10.

Next, MBR decoding and other decoding methods can be executed with mbrs-decode. This example regards the hypothesis set as the pseudo-reference set.

mbrs-decode \
  hypotheses.txt \
  --num_candidates 1024 \
  --nbest 1 \
  --source sources.txt \
  --references hypotheses.txt \
  --output translations.txt \
  --report report.txt --report_format rounded_outline \
  --decoder mbr \
  --metric comet \
  --metric.model Unbabel/wmt22-comet-da \
  --metric.batch_size 64 --metric.fp16 true

You can pass the arguments using a configuration yaml file via --config_path option. See docs for the details.

Finally, you can evaluate the score with mbrs-score:

mbrs-score \
  hypotheses.txt \
  --sources sources.txt \
  --references hypotheses.txt \
  --format json \
  --metric bleurt \
  --metric.batch_size 64 --metric.fp16 true

Python API

This is the example of COMET-MBR via Python API.

from mbrs.metrics import MetricCOMET
from mbrs.decoders import DecoderMBR

SOURCE = "ありがとう"
HYPOTHESES = ["Thanks", "Thank you", "Thank you so much", "Thank you.", "thank you"]

# Setup COMET.
metric_cfg = MetricCOMET.Config(
  model="Unbabel/wmt22-comet-da",
  batch_size=64,
  fp16=True,
)
metric = MetricCOMET(metric_cfg)

# Setup MBR decoding.
decoder_cfg = DecoderMBR.Config()
decoder = DecoderMBR(decoder_cfg, metric)

# Decode by COMET-MBR.
# This example regards the hypotheses themselves as the pseudo-references.
# Args: (hypotheses, pseudo-references, source)
output = decoder.decode(HYPOTHESES, HYPOTHESES, source=SOURCE, nbest=1)

print(f"Selected index: {output.idx}")
print(f"Output sentence: {output.sentence}")
print(f"Expected score: {output.score}")

List of implemented methods

Metrics

Currently, the following metrics are supported:

Decoders

The following decoding methods are implemented:

  • N-best reranking: rerank
  • MBR decoding: mbr

Specifically, the following methods of MBR decoding are included:

Selectors

The final output list is selected according to these selectors:

Related projects

  • mbr
    • Highly integrated with huggingface transformers by customizing generate() method of model implementation.
    • If you are looking for an MBR decoding library that is fully integrated into transformers, this might be a good choice.
    • Our mbrs works standalone; thus, not only transformers but also fairseq or LLM outputs via API can be used.

Citation

If you use this software, please cite:

@inproceedings{deguchi-etal-2024-mbrs,
    title = "mbrs: A Library for Minimum {B}ayes Risk Decoding",
    author = "Deguchi, Hiroyuki  and
      Sakai, Yusuke  and
      Kamigaito, Hidetaka  and
      Watanabe, Taro",
    editor = "Hernandez Farias, Delia Irazu  and
      Hope, Tom  and
      Li, Manling",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.37",
    pages = "351--362",
}

License

This library is mainly developed by Hiroyuki Deguchi and published under the MIT-license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbrs-0.1.6.tar.gz (75.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mbrs-0.1.6-py3-none-any.whl (91.1 kB view details)

Uploaded Python 3

File details

Details for the file mbrs-0.1.6.tar.gz.

File metadata

  • Download URL: mbrs-0.1.6.tar.gz
  • Upload date:
  • Size: 75.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.13

File hashes

Hashes for mbrs-0.1.6.tar.gz
Algorithm Hash digest
SHA256 0f78dfd988dce665c5d291dc98ffe590113b09de8887d6c13d886535ae049c5a
MD5 4e2dfcd312a1b560808d31cc84935b87
BLAKE2b-256 042fc23f670c32aa630d06af5b507f31d50bdcc1d06223a98715ce5037691f9e

See more details on using hashes here.

File details

Details for the file mbrs-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: mbrs-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 91.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.13

File hashes

Hashes for mbrs-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 aeed629cefa2d9f6c2c2e16e3609861ec2b7e618a7094eedf6664bfda44d738e
MD5 b3b40806612f562124b2c6c8436288ca
BLAKE2b-256 feb8b61b3c38ba01e99fe15c3fdbb6c4d57a5085e6d0241ea9af1693cd749e65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page