Converting universal tags to Apertium tags.
Project description
apertium2ud
Obtaining the mapping between the two tagsets based on the information from Apertium Wiki.
Loosely based on this code, hence the GPLv3 license.
To install, run
python -m pip install apertium2ud
The latest uploaded version is 0.0.8.
NB! The latest version from PyPI (yes, you can install the tool via pip) is equipped with the apertium-kir .udx file rules.
To build the machine-readable mapping, run
python apertium_wiki_parser.py
Apertium to Universal tags
>>> from apertium2ud.convert import a2ud
>>> tags = ["n", "pl", "acc"]
>>> a2ud(tags)
(['NOUN'], ['Number=Plur', 'Case=Acc'])
>>> tags_sophisticated = ["v", "tv", "ger", "nom", "cop", "aor", "p3", "pl"]
>>> a2ud(tags_sophisticated)
(['VERB', 'AUX'], ['Subcat=Tran', 'VerbForm=Vnoun', 'Case=Nom', 'Tense=Past', 'Person=3', 'Number=Plur'])
Universal tags to Apertium
So far the conversion is far from perfect
Кыз NOUN {'Number[psor]=Sing', 'Number=Sing', 'Case=Nom', 'Person[psor]=3', 'Person=3'} ->
<px3sg><n><subj?nom?><sg><p3><px3sp>
досуна NOUN {'Number[psor]=Sing', 'Number=Sing', 'Person[psor]=3', 'Case=Dat', 'Person=3'} ->
<px3sg><n><sg><dat><p3><px3sp>
кат NOUN {'Case=Nom', 'Person=3', 'Number=Sing'} ->
<n><subj?nom?><sg><p3>
жазган VERB {'Aspect=Perf', 'Polarity=Pos', 'Number=Sing', 'Tense=Past', 'Person=3', 'Evident=Fh'} ->
<past3p><vblex?v?vbmod?><sg><aff><aor?past?pret?><perf><p3>
. PUNCT set() ->
<sent?apos?percent?clb?punct?>
TODO
- Should sections
chunksand XML tags be added? No. - Tests: Apertium -> UD -> Apertium, UD -> Apertium -> UD (sometimes losses are inevitable)
- Add the possibility to add the rules based on a
.udxfile, which usually describes custom tags
How to cite
Greatly appreciated, if you use this work.
@misc{apertium2ud2023alekseev,
title = {{alexeyev/apertium2ud: mapping tagsets}},
year = {2023},
url = {https://github.com/alexeyev/apertium2ud}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apertium2ud-0.0.8.tar.gz.
File metadata
- Download URL: apertium2ud-0.0.8.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6738d13d6fadaadb64703798ce5db21928902c7ec1ccaa5bb3ff51b58b4324dd
|
|
| MD5 |
d81d2b353eb2888a6ef1b98d0aa067dc
|
|
| BLAKE2b-256 |
41091e7aab6f22f0a03ba73e73b5039650d2ecbad3d27047e3cd7318264b719c
|
File details
Details for the file apertium2ud-0.0.8-py3-none-any.whl.
File metadata
- Download URL: apertium2ud-0.0.8-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f8ceb7ce0ab40c657a0277fc57c4ce9f680056761aeb9f59606d670cb3b2577
|
|
| MD5 |
2c81c3c9e6bdcc14f72791551518754c
|
|
| BLAKE2b-256 |
0859a35ea22a187e24cbd88e10c25b8377cc4e5ea3f8a82b552590eb7a9a403e
|