Skip to main content

Alignment tool based on fast_align

Project description

systran-align

systran-align is a small alignment tool that is based on https://github.com/clab/fast_align.

Installation

pip install systran-align

Usage

import systran_align

Generating alignment probabilities

systran_align.generate_alignment_probabilities(
    input_path: str,
    forward_probs_path: str,
    backward_probs_path: str,
    verbose: bool = False,
    iterations: int = 5,
    favor_diagonal: bool = False,
    beam_threshold: float = -4,
    diagonal_tension: float = 4,
    optimize_tension: bool = False,
    variational_bayes: bool = False,
    alpha: float = 0.01,
    no_null_word: bool = False,
    prob_align_null: float = 0.08,
    thread_buffer_size: int = 10000,
)

Inputs:

  • input_path: text file where each line is a source-target example with format:
<source> ||| <target>

Outputs:

  • forward_probs_path: binary file containing forward probabilities
  • backward_probs_path: binary file containing backward probabilities

Computing alignments

aligner = systran_align.Aligner(
    forward_probs_path: str,
    backward_probs_path: str,
)

# result is a dict with fields:
# * alignments
# * forward_log_prob
# * backward_log_prob
result = aligner.align(
    source: List[str],
    target: List[str],
)

# Batch alternative:
results = aligner.align_batch(
    source: List[List[str]],
    target: List[List[str]],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

systran_align-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

systran_align-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

systran_align-3.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

systran_align-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

systran_align-3.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

systran_align-3.5.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

File details

Details for the file systran_align-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for systran_align-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 60417a3f63e10f92bb101b7265fde50c6cd38f0cc915f83af8c62b282f92cf8e
MD5 52cdb7743e8ecfee96f5260855896181
BLAKE2b-256 43b9e1ed8233345e74693cf28cba222293a86ca5c1d29c44a023b279da49b469

See more details on using hashes here.

File details

Details for the file systran_align-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for systran_align-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 129009cc00f8a6d1d73d2d6e0b57898d291fa58480ba8e8b82606975b058d65a
MD5 69ceadc5ec80339934adf3b9321528e9
BLAKE2b-256 c327231a66e9414f68f40b20309011efd991b63dce3fed66a93b7e77763a07b9

See more details on using hashes here.

File details

Details for the file systran_align-3.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for systran_align-3.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b37ebedf0454a92d282c5f9ac26de5e39777f9d751e819ffa98b2ea0c0770e8a
MD5 f01372578d10e4b6fedfa1391becd19e
BLAKE2b-256 423c1dfdf25601eed6c35c6eb9c4a7257a1350f4cc113875417edce8789484c0

See more details on using hashes here.

File details

Details for the file systran_align-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for systran_align-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bd3c030725f37b084cefe0a2dc564cc0d91da95a7cbaea4a9cdb05548728472e
MD5 0bb74ab9af6ceffa2ad9fc76ea99aa82
BLAKE2b-256 8cdb22cff305f010e01a6de25c3e5a98db8abd0f49cbadd6a43f7ad24f3bb1d3

See more details on using hashes here.

File details

Details for the file systran_align-3.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for systran_align-3.5.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f70ea915a81bad454116c23d5c3a398527cf10f41f3828f681296e73381a42bc
MD5 8819853503731b59021190e60d2fe4d4
BLAKE2b-256 ffd3cce5aea7efa0e046bf2b7044de498a8e1ac25cb76e5e3d2e9acbc7626b95

See more details on using hashes here.

File details

Details for the file systran_align-3.5.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for systran_align-3.5.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 09e9ce3926f4df989ed934541f02f404d36420880a6353a32495aa9c22368b97
MD5 e6bb547b1d19b80263e72380b508bb8e
BLAKE2b-256 88dc57844870a0bfc8fc5f7935e8e325585fc92b071e3e4857883e9203d9265c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page