Skip to main content

Syntactic transfer from more resourced languages: TRAnslating TREEbanks for Syntactic TRAnsfer.

Project description

Simple Syntactic Transfer Based on the Treebank Translation Method

Intended use

Given

  • the sentence in the language of interest (LRL, e.g. Kyrgyz),
  • the translation of the sentence to the more resourced language (e.g. Turkish),
  • the dependency parser (UD) for the more resourced language (e.g. Stanza-UD_BOUN-BERT),
  • the alignment model of your liking (source language should be the language of interest),
  • the morphological analyzer including PoS tags (Universal Tagset) for the language of interest (e.g. apertium-kir; note that it must not modify the tokenization),

generate a dependency tree for the sentence in the target language.

Clearly, it's far from perfect, but may still be useful to speed up the manual treebank annotation. Please see the paper for more details.

Example

We have provided some bindings to the popular libraries in tratreetra/models.py; the appropriate versions of these libraries should be installed, please consult the respective docstrings.

The example code in example/example.py reproduces one of the results from the paper:

  • Stanza-IMST-charlm
  • SimAlign-XLMR
  • apertium-kir (without morphological disambiguation)
  • Translation via ChatGPT4o

Please see more details in the example/README.

How to cite

The paper is still in print, the preprint on arXiv will be made available soon.

Meanwhile, if you use our tool, we'll be grateful if you cite it as follows:

@article{atkn2025syntax,
    author = {Alekseev, Anton and Tillabaeva, Alina and Kabaeva, Gulnara Dzh. and Nikolenko, Sergey I.},
    title = {{Syntactic Transfer to Kyrgyz Using the Treebank Translation Method (in print)}},
    journal = {To appear in the Journal of Mathematical Sciences},
    publisher = {Springer},
    year = {2025}
}

TODO

  • Thoughtful approach to data structures
  • Profile the code
  • Upload to pypi
  • Tests
  • Redesign logging here and in apertium2ud

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tratreetra-0.1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tratreetra-0.1-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file tratreetra-0.1.tar.gz.

File metadata

  • Download URL: tratreetra-0.1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for tratreetra-0.1.tar.gz
Algorithm Hash digest
SHA256 9ab6090bb064eafb9eaf051e6907cc554144291b1153143fc1b06a222e68fa46
MD5 7c849b4c793042a8ee865bbca80c4f64
BLAKE2b-256 b6a7c8db1987c6491cf7d193a60226c3d33ef33eae585596f8722c084e599376

See more details on using hashes here.

File details

Details for the file tratreetra-0.1-py3-none-any.whl.

File metadata

  • Download URL: tratreetra-0.1-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for tratreetra-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 629d32be7a2cd3e46101e4d5110f58477fb46f11ba9d5e93499cdeeb67234340
MD5 0598ad99da9c4de5542c0e5ccf83101f
BLAKE2b-256 d9fecbaa17c41c6471f9a3183ea898362ffa349a9c287dc3dac6a84a39124ce6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page