python converter from UD to enhanced UD representations
Project description
UD2UDE
This (badly-named) UD2UDE project (stands for Universal Dependencies to Universal Dependencies enhancements) is the main project of a 'u2dude' project series, all related to the main goal of my thesis.
- Converter: (The current project,) Universal-Dependencies to enhancedUD converter, aimed to port core-nlp's Java converter to a python(3.6) converter, embedded with my researched add-ins ("extra or Aryeh's enhancements").
- Model: spaCy trained model based on UD (and PENN converted to UD) dataset.
- Demo: {will be move to public when stable} JS and python code, making use of the converter. Simply check it out here
Converter
The converter converts UD (v1.4) to enhancedUD, enhancedUD++, and extra-enhancements (discovered as part of my thesis). It supports Conll-U and Odin formats (and some conversions between them).
Generally, I tried to maintain the same behavior (mentioned here, and that was implemented by core-NLP java converter) as much as reasonable.
The converter coveres the following conversions:
paper (or here) | UD formal guidelines (v2) | coreNLP code | Converter | notes | |
---|---|---|---|---|---|
nmod/acl/advcl case info | eUD | eUD (under 'obl' for v2) | eUD | eUD | 1. Even though multi-word prepositions are processed only under eUD++, it is still handled under eUD to add it in the case information. 2. Lowercased (and not lemmatized - important for MWP) |
Passive agent | - | - | eUD | eUD | Only if the nmod both has a "by" son and has an 'auxpass' sibling, then instead of nmod:by we fix to nmod:agent |
conj case info | eUD | eUD | eUD | eUD | 1. Adds the type of conjunction to all conjunct relations 2. Some multi-word coordination markers are collapsed to conj:and or conj:negcc |
Process Multi-word prepositions | eUD++ | eUD (?) | eUD++ | eUD++ | Predetermined lists of 2w and 3w preps. |
Demote quantificational modifiers (A.K.A Partitives and light noun constructions) | eUD++ | (see here) | eUD++ | eUD++ | Predetermined list of the quantifier or light noun. |
Conjoined prepositions and prepositional phrases | eUD++ | - | eUD++ | eUD++ | |
Propagated governors and dependents | eUD (A, B, C) | eUD (A, B, C, D) | eUD (A, B, C) | eUD (A, B, C) | 1. This includes: (A) conjoined noun phrases, (B) conjoined adjectival phrases, (C) subjects of conjoined verbs, and (D) objects of conjoined verbs. 2. Notice (D) is relevant to be added theoretically but was omitted for practical uncertainty (see 4.2 at the paper). |
Subjects of controlled verbs | eUD | eUD | eUD | eUD | 1. Includes the special case of 'to' with no following verb ("he decided not to"). 2. Heuristic for choosing the propagated subject (according to coreNLP docu): if the control verb has an object it is propagated as the subject of the controlled verb, otherwise they use the subject of the control verb. |
Subjects of controlled verbs - when 'to' marker is missing | ? | ? | - | extra | 1. Example: "I started reading the book" 2. For some reason not included in the coreNLP code, unsure why |
Relative pronouns | eUD++ | eUD (?) | eUD++ | eUD++ | |
Reduced relative clause | - | eUD (?) | - | extra | |
Subjects of adverbial clauses | - | - | - | extra | Heuristic for choosing the propagated entity: 1. If the marker is "to", the object (if it is animated - but for now we don’t enforce it) of the main clause is propagated as subject, otherwise the subject of the main clause is propagated. 2. Else, if the marker is not one of "as/so/when/if" (this includes no marker at all which is mostly equivalent to "while" marker), both the subject and the object of the main clause are equivalent options (unless no object found, then the subject is propagated). |
Noun-modifying participles | (see here) | - | - | extra | |
Correct possible subject of Noun-modifying participles | - | - | - | extra | 1. This is a correctness of the subject decision of the previous bullet. 2. If the noun being modified is an object/modifier of a verb with some subject, then that subject might be the subject of the Noun-modifying participle as well. (it is uncertain, and seems to be correct only for the more abstract nouns, but that’s just a first impression). |
Propagated modifiers (in conjunction constructions) | - | - | - | extra | Heuristics and assumptions: 1. Modifiers that appear after both parts of the conjunction may (the ratio should be researched) refer to both parts. Moreover, If the modifiers father is not the immediate conjunction part, then all the conjunction parts between the father and the modifier are (most probably) modified by the modifier. 2. If the modifier father is the immediate conjunction part, we propagate the modifier backward only if the new father, doesn't have any modifiers sons (this is to restrict a bit the amount of false-positives). 3. We don’t propagate modifier forwardly (that is, if the conjunct part appears after the modifier, we assume they don’t refer). 4. Should be tested for cost/effectiveness as it may bring many false-positives. |
Locative and temporal adverbial modifier propagation (indexicals) | - | - | - | extra | 1. Rational: If a locative or temporal adverbial modifier is stretched away from the verb through a subject/object/modifier(nmod) it should be applied as well to the verb itself. 2. Example: "He was running around, in these woods here". |
Subject propagation of 'dep' | - | - | - | extra | Rational: 'dep' is already problematic, as the parser didn't know what relation to assign it. In case the secondary clause doesn't have a subject, most probably it should come from the main clause. It is probably an advcl/conj/parataxis/or so that was missing some marker/cc/punctuation/etc. |
Apposition propagation | (see here) | - | - | extra | |
nmod propagation through subj/obj/nmod | - | - | - | extra | For now we propagate only modifiers cased by 'like' or 'such_as' prepositions (As they imply reflexivity), and we copy their heads' relation (that is, obj for obj subj for subj and nmod for nmod with its corresponding case). |
possessive | - | - | - | extra | Share possessive modifiers through conjunctions (e.g. My father and mother went home -> My father and (my) mother... |
Expanding multi word prepositions | - | - | - | extra | Add an nmod relation when advmod+nmod is observed while concatinating the advmod and preposition to be the new modifiers preposition (this expands the closed set of eUD's 'Process Multi-word preposition'). |
Active-passive alteration | (see here) | - | - | extra | Invert subject and object of passive construction (while keeping the old ones). |
Copula alteration | - | - | - | extra | Add a verb placeholder, reconstruct the tree as if the verb was there. |
Hyphen alteration | - | - | - | extra | Add subject and modifier relations to the verb in the middle of an noun-verb adjectival modifing another noun (e.g. a Miami-based company). |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ud2ude-2.2.2.tar.gz
.
File metadata
- Download URL: ud2ude-2.2.2.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.3.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd6f5931826be71924fbbb0e8e423f275c5bdafd8830fc2151b86b70ea656842 |
|
MD5 | 7cf557eda68cddaf670ccb65796de345 |
|
BLAKE2b-256 | ed44e8cf8fe671cf546b735f239045ae4049e88f0c592781751af4aa5108efbf |
File details
Details for the file ud2ude-2.2.2-py3-none-any.whl
.
File metadata
- Download URL: ud2ude-2.2.2-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.3.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84898398b07759bcbe6d7ba28335c78c11b75e33dd07af0b1cc40a0afe18981a |
|
MD5 | f174a64989b3e5865c79c567dcbc60ff |
|
BLAKE2b-256 | bccd364e7308c041617190dabc047742fc81b7a9964a92cd74b6ab66453951e8 |