A collection of syntactic metrics to calculate (dis)similarities between source and target sentences.
Project description
Documentation, and tests to be added. You can already use examples/add_features_tprdb.py
, though.
Use python examples/add_features_tprdb.py -h
to get started.
Example notebooks
A couple example notebooks exist, each with a different grade of automation for the initialisation of the aligned object. Once an aligned object has been created, the functionality is identical.
High automation: automate all the things. Tokenisation, parsing, and word alignment is done automatically.
Normal automation: the typical scenario where you have tokenised and aligned text that is not parsed yet
No automation: full-manual mode, where you provide all the required information, including dependency labels and heads
Installation
Requires Python 3.7 or higher.
This library relies on stanza to parse text into dependencies, which in turn depends on PyTorch. make sure that you have a valid PyTorch installation prior to installing this library.
When PyTorch is installed, and you have cloned this library, you can run pip install .
which will autmatically install
the required dependencies.
git clone https://github.com/BramVanroy/astred.git
cd astred
pip install .
Automatic Word Alignment
Automatic word alignment is supported by using a modified version of Awesome Align under the hood. This is a neural
word aligner that uses transfer learning with multilingual models to do word alignment. It does require
some manual installation work. Specifically, you need to install the astred_compat
branch from this fork.
If you are using pip, you can run the following command:
pip install git+https://github.com/BramVanroy/awesome-align.git@astred_compat
Awesome Align requires PyTorch, like stanza
above.
If it is installed, you can initialize AlignedSentences
without providing word alignments. Those will be added
automatically behind the scenes. See this example notebook for more.
sent_en = Sentence.from_text("I like eating cookies", "en")
sent_nl = Sentence.from_text("Ik eet graag koekjes", "nl")
# Word alignments do not need to be added on init:
aligned = AlignedSentences(sent_en, sent_nl)
Keep in mind however that automatic alignment will never have the same quality as manual alignments. Use with caution! I highly suggest reading the paper of Awesome Align to see whether it is a good pick for you.
License
Licensed under Apache License Version 2.0. See the LICENSE file attached to this repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.