Skip to main content

Alignment tool based on fast_align

Project description

systran-align

systran-align is a small alignment tool that is based on https://github.com/clab/fast_align.

Installation

pip install systran-align

Usage

import systran_align

Generating alignment probabilities

systran_align.generate_alignment_probabilities(
    input_path: str,
    forward_probs_path: str,
    backward_probs_path: str,
    verbose: bool = False,
    iterations: int = 5,
    favor_diagonal: bool = False,
    beam_threshold: float = -4,
    diagonal_tension: float = 4,
    optimize_tension: bool = False,
    variational_bayes: bool = False,
    alpha: float = 0.01,
    no_null_word: bool = False,
    prob_align_null: float = 0.08,
    thread_buffer_size: int = 10000,
)

Inputs:

  • input_path: text file where each line is a source-target example with format:
<source> ||| <target>

Outputs:

  • forward_probs_path: binary file containing forward probabilities
  • backward_probs_path: binary file containing backward probabilities

Computing alignments

aligner = systran_align.Aligner(
    forward_probs_path: str,
    backward_probs_path: str,
)

# result is a dict with fields:
# * alignments
# * forward_log_prob
# * backward_log_prob
result = aligner.align(
    source: List[str],
    target: List[str],
)

# Batch alternative:
results = aligner.align_batch(
    source: List[List[str]],
    target: List[List[str]],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for systran-align, version 3.3.0
Filename, size File type Python version Upload date Hashes
Filename, size systran_align-3.3.0-cp27-cp27m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (782.3 kB) File type Wheel Python version cp27 Upload date Hashes View
Filename, size systran_align-3.3.0-cp27-cp27mu-manylinux_2_5_x86_64.manylinux1_x86_64.whl (782.2 kB) File type Wheel Python version cp27 Upload date Hashes View
Filename, size systran_align-3.3.0-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (782.7 kB) File type Wheel Python version cp35 Upload date Hashes View
Filename, size systran_align-3.3.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (783.2 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size systran_align-3.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (784.5 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size systran_align-3.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (774.7 kB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size systran_align-3.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (774.3 kB) File type Wheel Python version cp39 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page