Syntactic transfer from more resourced languages: TRAnslating TREEbanks for Syntactic TRAnsfer.
Project description
Simple Syntactic Transfer Based on the Treebank Translation Method
Intended use
Given
- the sentence in the language of interest (LRL, e.g. Kyrgyz),
- the translation of the sentence to the more resourced language (e.g. Turkish),
- the dependency parser (UD) for the more resourced language (e.g. Stanza-UD_BOUN-BERT),
- the alignment model of your liking (source language should be the language of interest),
- the morphological analyzer including PoS tags (Universal Tagset) for the language of interest (e.g.
apertium-kir; note that it must not modify the tokenization),
generate a dependency tree for the sentence in the target language.
Clearly, it's far from perfect, but may still be useful to speed up the manual treebank annotation. Please see the paper for more details.
Example
We have provided some bindings to the popular libraries in tratreetra/models.py; the appropriate versions of these libraries should be installed, please consult the respective docstrings.
The example code in example/example.py reproduces one of the results from the paper:
Stanza-IMST-charlmSimAlign-XLMRapertium-kir(without morphological disambiguation)- Translation via
ChatGPT4o
Please see more details in the example/README.
How to cite
The paper is still in print, the preprint on arXiv will be made available soon.
Meanwhile, if you use our tool, we'll be grateful if you cite it as follows:
@article{atkn2025syntax,
author = {Alekseev, Anton and Tillabaeva, Alina and Kabaeva, Gulnara Dzh. and Nikolenko, Sergey I.},
title = {{Syntactic Transfer to Kyrgyz Using the Treebank Translation Method (in print)}},
journal = {To appear in the Journal of Mathematical Sciences},
publisher = {Springer},
year = {2025}
}
TODO
- Thoughtful approach to data structures
- Profile the code
- Upload to pypi
- Tests
- Redesign logging here and in apertium2ud
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tratreetra-0.1.tar.gz.
File metadata
- Download URL: tratreetra-0.1.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ab6090bb064eafb9eaf051e6907cc554144291b1153143fc1b06a222e68fa46
|
|
| MD5 |
7c849b4c793042a8ee865bbca80c4f64
|
|
| BLAKE2b-256 |
b6a7c8db1987c6491cf7d193a60226c3d33ef33eae585596f8722c084e599376
|
File details
Details for the file tratreetra-0.1-py3-none-any.whl.
File metadata
- Download URL: tratreetra-0.1-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
629d32be7a2cd3e46101e4d5110f58477fb46f11ba9d5e93499cdeeb67234340
|
|
| MD5 |
0598ad99da9c4de5542c0e5ccf83101f
|
|
| BLAKE2b-256 |
d9fecbaa17c41c6471f9a3183ea898362ffa349a9c287dc3dac6a84a39124ce6
|