Skip to main content

A collection of syntactic metrics to calculate (dis)similarities between source and target sentences.

Project description

Example notebooks

A couple example notebooks exist, each with a different grade of automation for the initialisation of the aligned object. Once an aligned object has been created, the functionality is identical.

  • High automation: automate all the things. Tokenisation, parsing, and word alignment is done automatically [Try on Colab]

  • Normal automation: the typical scenario where you have tokenised and aligned text that is not parsed yet [Try on Colab]

  • No automation: full-manual mode, where you provide all the required information, including dependency labels and heads [Try on Colab]

  • Monolingual: in this example we rely on spaCy to compare two English sentences and calculate semantic similarity between aligned words [Try on Colab]

Installation

Requires Python 3.7 or higher. To keep the overhead low, a default parser is NOT installed. Currently both spaCy and stanza are supported and you can choose which one to use. Stanza is recommended for bilingual research (because it is ensured that all of its models use Universal Dependencies), but spaCy can be used as well. The latter is especially used for monolingual comparisons, or if you are not interested in the linguistic comparisons and only require word reordering metrics.

A pre-release is available on PyPi. You can install it with pip as follows.

# Install with stanza (recommended)
pip install astred[stanza]
# ... or install with spacy
pip install astred[spacy]
# ... or install with both and decide later
pip install astred[parsers]

If you want to use spaCy, you have to make sure that you install the required models manually, which cannot be automated.

Automatic Word Alignment

Automatic word alignment is supported by using a modified version of Awesome Align under the hood. This is a neural word aligner that uses transfer learning with multilingual models to do word alignment. It does require some manual installation work. Specifically, you need to install the astred_compat branch from this fork. If you are using pip, you can run the following command:

pip install git+https://github.com/BramVanroy/awesome-align.git@astred_compat

Awesome Align requires PyTorch, like stanza above.

If it is installed, you can initialize AlignedSentences without providing word alignments. Those will be added automatically behind the scenes. See this example notebook [Try on Colab] for more.

sent_en = Sentence.from_text("I like eating cookies", "en")
sent_nl = Sentence.from_text("Ik eet graag koekjes", "nl")

# Word alignments do not need to be added on init:
aligned = AlignedSentences(sent_en, sent_nl)

Keep in mind however that automatic alignment will never have the same quality as manual alignments. Use with caution! I highly suggest reading the paper of Awesome Align to see whether it is a good pick for you.

License

Licensed under Apache License Version 2.0. See the LICENSE file attached to this repository.

Citation

Please cite our papers if you use this library.

Vanroy, B., De Clercq, O., Tezcan, A., Daems, J., & Macken, L. (2021). Metrics ofsyntactic equivalence to assess translation difficulty. In M. Carl (Ed.), Explorations in empirical translation process research (Vol. 3, pp. 259–294). Cham, Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-69777-8_10

@incollection{vanroy2021metrics,
    title = {Metrics of syntactic equivalence to assess translation difficulty},
    booktitle = {Explorations in empirical translation process research},
    author = {Vanroy, Bram and De Clercq, Orph{\'e}e and Tezcan, Arda and Daems, Joke and Macken, Lieve},
    editor = {Carl, Michael},
    year = {2021},
    series = {Machine {{Translation}}: {{Technologies}} and {{Applications}}},
    volume = {3},
    pages = {259--294},
    publisher = {{Springer International Publishing}},
    address = {{Cham, Switzerland}},
    isbn = {978-3-030-69776-1},
    url = {https://link.springer.com/chapter/10.1007/978-3-030-69777-8_10},
    doi = {10.1007/978-3-030-69777-8_10}
}

Vanroy, B., Schaeffer, M., & Macken, L. (2021). Comparing the Effect of Product-Based Metrics on the Translation Process. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.681945

@article{vanroy2021comparing,
    publisher = {Frontiers},
    author = {Vanroy, Bram and Schaeffer, Moritz and Macken, Lieve},
    title = {Comparing the effect of product-based metrics on the translation process},
    year = {2021},
    journal = {Frontiers in Psychology},
    volume = {12},
    issn = {1664-1078},
    url = {https://www.frontiersin.org/article/10.3389/fpsyg.2021.681945},
    doi = {10.3389/fpsyg.2021.681945},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

astred-0.9.7.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

astred-0.9.7-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file astred-0.9.7.tar.gz.

File metadata

  • Download URL: astred-0.9.7.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.8

File hashes

Hashes for astred-0.9.7.tar.gz
Algorithm Hash digest
SHA256 dd02f161b0527a516e1c332f122236fe59e81d3a8f32fab2da003d67ccb1cb08
MD5 1784f311b77af0684d1c8fbcb05b081b
BLAKE2b-256 f8d4d409b36ed1c5cc27c240dcdc9e28e399c56a09c511c411abeebffef4df77

See more details on using hashes here.

File details

Details for the file astred-0.9.7-py3-none-any.whl.

File metadata

  • Download URL: astred-0.9.7-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.8

File hashes

Hashes for astred-0.9.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b0e0baacf63dfa54aa9e06b181c714ea8e04fca0b8490d11cca1efb94dd96535
MD5 1ad96a9df7257e73cb3796de728aed18
BLAKE2b-256 6c2969b6ee0ee224d4bae09fd3ea471dbb301228a47897a3a2e97b4441914da8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page