Skip to main content

Pymarian

Project description

PyMarian

  • Python bindings to Marian (C++) is using [PyBind11]
  • The python package is built using scikit-build-core

Install

# build marian with -DPYMARIAN=on option to create a pymarian wheel
cmake . -Bbuild -DCOMPILE_CUDA=off -DPYMARIAN=on -DCMAKE_BUILD_TYPE=Release
cmake --build build -j       # -j option parallelizes build on all cpu cores
python -m pip install build/pymarian-*.whl

The above commands use python executable in the PATH to determine Python version for compiling marian native extension. Make sure to have the desired python executable in your environment before invoking these cmake commands.

Python API

Python API is designed to take same argument as marian CLI string.

NOTE: these APIs are experimental only and not finalized. see mtapi_server.py for an example use of Translator API

Translator

# Translator
from pymarian import Translator
cli_string = "..."
translator = Translator(cli_string)

sources = ["sent1" , "sent2" ]
result = translator.translate(sources)
print(result)

Evaluator

# Evaluator
from pymarian import Evaluator
cli_string = '-m path/to/model.npz -v path/to.vocab.spm path/to.vocab.spm --like comet-qe'
evaluator = Evaluator(cli_str)

data = [
    ["Source1", "Hyp1"],
    ["Source2", "Hyp2"]
]
scores = evaluator.run(data)
for score in scores:
    print(score)

CLI Usage

. pymarian-evaluate : CLI to download and use pretrained metrics such as COMETs, COMETOIDs, ChrFoid, and BLEURT . pymarian-mtapi : REST API demo powered by Flask . pymarian-qtdemo : GUI App demo powered by QT

pymarian-eval

$ pymarian-eval -h 
usage: pymarian-eval [-h] [-m MODEL] [-v VOCAB] [-l {comet-qe,bleurt,comet}] [-V] [-] [-t MT_FILE] [-s SRC_FILE] [-r REF_FILE] [-f FIELD [FIELD ...]] [-o OUT] [-a {skip,append,only}] [-w WIDTH] [--debug] [--fp16] [--mini-batch MINI_BATCH] [-d [DEVICES ...] | -c
                     CPU_THREADS] [-ws WORKSPACE] [-pc]

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Model name, or path. Known models: bleurt-20, wmt20-comet-da, wmt20-comet-qe-da, wmt20-comet-qe-da-v2, wmt21-comet-da, wmt21-comet-qe-da, wmt21-comet-qe-mqm, wmt22-comet-da, wmt22-cometkiwi-da, xcomet-xl, xcomet-xxL (default: wmt22-cometkiwi-da)
  -v VOCAB, --vocab VOCAB
                        Vocabulary file (default: None)
  -l {comet-qe,bleurt,comet}, --like {comet-qe,bleurt,comet}
                        Model type. Required if --model is a local file (auto inferred for known models) (default: None)
  -V, --version         show program's version number and exit
  -, --stdin            Read input from stdin. TSV file with following format: QE metrics: "src<tab>mt", Ref based metrics ref: "src<tab>mt<tab>ref" or "mt<tab>ref" (default: False)
  -t MT_FILE, --mt MT_FILE
                        MT output file. Ignored when --stdin (default: None)
  -s SRC_FILE, --src SRC_FILE
                        Source file. Ignored when --stdin (default: None)
  -r REF_FILE, --ref REF_FILE
                        Ref file. Ignored when --stdin (default: None)
  -f FIELD [FIELD ...], --fields FIELD [FIELD ...]
                        Input fields, an ordered sequence of {src, mt, ref} (default: ['src', 'mt', 'ref'])
  -o OUT, --out OUT     output file (default: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)
  -a {skip,append,only}, --average {skip,append,only}
                        Average segment scores to produce system score. skip=do not output average (default; segment scores only); append=append average at the end; only=output the average only (i.e. system score only) (default: skip)
  -w WIDTH, --width WIDTH
                        Output score width (default: 4)
  --debug               Debug or verbose mode (default: False)
  --fp16                Enable FP16 mode (default: False)
  --mini-batch MINI_BATCH
                        Mini-batch size (default: 16)
  -d [DEVICES ...], --devices [DEVICES ...]
                        GPU device IDs (default: None)
  -c CPU_THREADS, --cpu-threads CPU_THREADS
                        Use CPU threads. 0=use GPU device 0 (default: None)
  -ws WORKSPACE, --workspace WORKSPACE
                        Workspace memory (default: 8000)
  -pc, --print-cmd      Print marian evaluate command and exit (default: False)
  --cache CACHE         Cache directory for storing models (default: $HOME/.cache/marian/metric)

More info at https://github.com/marian-nmt/marian-dev. This CLI is loaded from .../python3.10/site-packages/pymarian/eval.py (version: 1.12.25)

Performance Tuning Tips:

  • For CPU parallelization, --cpu-threads <n>
  • For GPU parallelization, assuming pymarian was compiled with cuda support, e.g., --devices 0 1 2 3 to use the specified 4 gpu devices.
  • When OOM error: adjust --mini-batch argument
  • To see full logs from marian, set --debug

pymarian-mtapi

Launch server

# example model: download and extract
wget http://data.statmt.org/romang/marian-regression-tests/models/wngt19.tar.gz 
tar xvf wngt19.tar.gz 

# launch server
pymarian-mtapi -s en -t de "-m wngt19/model.base.npz -v wngt19/en-de.spm wngt19/en-de.spm"

Example request from client

URL="http://127.0.0.1:5000/translate"
curl $URL --header "Content-Type: application/json" --request POST --data '[{"text":["Good Morning."]}]'

pymarian-qtdemo

pymarian-qtdemo

Code Formatting

pip install black isort
isort .
black .
cd src/python

Run Tests

# install pytest if necessary
python -m pip install pytest

# run tests in quiet mode
python -m pytest src/python/tests/regression

# or, add -s to see STDOUT/STDERR from tests
python -m pytest -s src/python/tests/regression

Release Instructions

Building Pymarian for Multiple Python Versions

Our CMake scripts detects python3.* available in PATH and builds pymarian for each. To support a specific version of python, make the python3.x executable available in PATH prior to running cmake. This can be achieved by (without conflicts) using conda or mamba.

# setup mamba if not already; Note: you may use conda as well
which mamba || {
   name=Miniforge3-$(uname)-$(uname -m).sh
   wget "https://github.com/conda-forge/miniforge/releases/latest/download/$name" \
      && bash $name -b -p ~/mambaforge && ~/mambaforge/bin/mamba init bash && rm $name
}

# create environment for each version
versions="$(echo 3.{12,11,10,9,8,7})"
for version in $versions; do
   echo "python $version"
   mamba env list | grep -q "^py${version}" || mamba create -q -y -n py${version} python=${version}
done

# stack all environments
for version in $versions; do mamba activate py${version} --stack; done
# check if all python versions are available
for version in $versions; do which python$version; done


# Build as usual
cmake . -B build -DCOMPILE_CUDA=off -DPYMARIAN=on
cmake --build build -j
ls build/pymarian*.whl

Upload to PyPI

twine upload -r testpypi build/*.whl

twine upload -r pypi build/*.whl

Initial Setup: create ~/.pypirc with following:

[distutils]
index-servers =
    pypi
    testpypi

[pypi]
repository: https://upload.pypi.org/legacy/
username:__token__
password:<token>

[testpypi]
repository: https://test.pypi.org/legacy/
username:__token__
password:<token>

Obtain token from https://pypi.org/manage/account/

Known issues

  1. In conda or mamba environment, if you see .../miniconda3/envs/<envname>/bin/../lib/libstdc++.so.6: version 'GLIBCXX_3.4.30' not found error, install libstdcxx-ng

    conda install -c conda-forge libstdcxx-ng
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pymarian-1.12.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (602.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

pymarian-1.12.31-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (602.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pymarian-1.12.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (602.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pymarian-1.12.31-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (602.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pymarian-1.12.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (602.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file pymarian-1.12.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymarian-1.12.31-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6b35d3c2fbd88ddd8a0627ee5449e5405d2a040165e4d2a3e0b11db2530666da
MD5 dad5e48a8679ade10a716b8a2b0295d6
BLAKE2b-256 e1ce540ec852f91547012ac55bd5d88048478b3a73f6bf68dcccee2b9aa3aa4a

See more details on using hashes here.

File details

Details for the file pymarian-1.12.31-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymarian-1.12.31-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b6db1ed7eb3510422420caa682fbae3cbaca5bfbc1caade224a89f1b8119363e
MD5 446c4715cfec09a5f41668dbc85dbeab
BLAKE2b-256 b98864cc9975be611b8c325b725595135c679f7279001eba104184634b20be96

See more details on using hashes here.

File details

Details for the file pymarian-1.12.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymarian-1.12.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4cbd9d3895fa8f9ea89f2d69948c6e889b1305c515b6b1d4c6aece41bc06b907
MD5 9ff16f68c6c0010048888bfb902dfdec
BLAKE2b-256 8d3a15a913a5f2af6041ee63908a86d0d615bb887dff5762ea7ecc540281aed5

See more details on using hashes here.

File details

Details for the file pymarian-1.12.31-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymarian-1.12.31-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 acaeeaedb6fc4b8a244131273f6ffee5373b75450418e5252f6efd0a30844eea
MD5 dabef1332f2ca2df9e081b49c06e2faa
BLAKE2b-256 7dc8f25ced1694ddd466a6565b4e416ec1cf270898ff4200af5ac24bc936d66c

See more details on using hashes here.

File details

Details for the file pymarian-1.12.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymarian-1.12.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b4be638d89c61790a8093e036c4bd2642c3aa3daddb3ad73582efe3bc675f95d
MD5 43a82d34b5cc023fe205395574a3db56
BLAKE2b-256 65a3979cca59525446ae6c38fb792dbd7c87fd67675750962586b8a296d52d29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page