Skip to main content

Alignment tool based on fast_align

Project description

systran-align

systran-align is a small alignment tool that is based on https://github.com/clab/fast_align.

Installation

pip install systran-align

Usage

import systran_align

Generating alignment probabilities

systran_align.generate_alignment_probabilities(
    input_path: str,
    forward_probs_path: str,
    backward_probs_path: str,
    verbose: bool = False,
    iterations: int = 5,
    favor_diagonal: bool = False,
    beam_threshold: float = -4,
    diagonal_tension: float = 4,
    optimize_tension: bool = False,
    variational_bayes: bool = False,
    alpha: float = 0.01,
    no_null_word: bool = False,
    prob_align_null: float = 0.08,
    thread_buffer_size: int = 10000,
)

Inputs:

  • input_path: text file where each line is a source-target example with format:
<source> ||| <target>

Outputs:

  • forward_probs_path: binary file containing forward probabilities
  • backward_probs_path: binary file containing backward probabilities

Computing alignments

aligner = systran_align.Aligner(
    forward_probs_path: str,
    backward_probs_path: str,
)

# result is a dict with fields:
# * alignments
# * forward_log_prob
# * backward_log_prob
result = aligner.align(
    source: List[str],
    target: List[str],
)

# Batch alternative:
results = aligner.align_batch(
    source: List[List[str]],
    target: List[List[str]],
)

Project details


Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page