Skip to main content

python converter from UD-tree to BART-graph representations

Project description


A Python converter from Universal-Dependencies trees to BART representation.
Try out our UD-BART comparison Demo



BART (Bar-Ilan & AI2 Representation Transformation) is our new and cool enhanced-syntatic-representation specialized to improve Relation Extraction, but suitable for any NLP down-stream task.

See our pyBART: Evidence-based Syntactic Transformations for IE for detailed description of BART's creation/linguisical-verification/evaluation processes, and list of conversions.

This project is part of a wider project series, related to BART:

  1. Converter: The current project.
  2. Model: UD based spaCy model (pip install the_large_model). This model is needed when using the converter as a spaCy pipeline component (as spaCy doesn't provide UD-format based models).
  3. Demo: Web-demo making use of the converter, to compare between UD and BART representations.

Table of contents

Table of contents generated with markdown-toc

Converter description

  • Converts UD (v1.4) to BART.
  • Supports Conll-U format, spaCy docs, and spaCy pipeline component (see Usage).
  • Highly configurable (see Configuration).

Note: The BART representation subsumes Stanford's EnhancedUD conversions, these conversions are described here, and were already implemented by core-NLP Java converter. As such they were not avialable to python users and thus we have ported them to pyBART and tried to maintain their behavior as much as reasonable.

Conversion list

Click here if you wish to see the list of covered conversions (TBD: really needs to be updated!)

[paper](https://nlp.stanford.edu/pubs/schuster2016enhanced.pdf) (or [here](http://www.lrec-conf.org/proceedings/lrec2016/pdf/779_Paper.pdf)) [UD formal guidelines (v2)](https://universaldependencies.org/u/overview/enhanced-syntax.html) coreNLP code Converter notes
nmod/acl/advcl case info eUD eUD (under 'obl' for v2) eUD eUD 1. Even though multi-word prepositions are processed only under eUD++, it is still handled under eUD to add it in the case information.<br>2. Lowercased (and not lemmatized - important for MWP)
Passive agent - - eUD eUD Only if the nmod both has a "by" son and has an 'auxpass' sibling, then instead of nmod:by we fix to nmod:agent
conj case info eUD eUD eUD eUD 1. Adds the type of conjunction to all conjunct relations<br>2. Some multi-word coordination markers are collapsed to conj:and or conj:negcc
Process Multi-word prepositions eUD++ eUD (?) eUD++ eUD++ Predetermined lists of 2w and 3w preps.
Demote quantificational modifiers (A.K.A Partitives and light noun constructions) eUD++ (see [here](https://universaldependencies.org/u/overview/enhanced-syntax.html#additional-enhancements)) eUD++ eUD++ Predetermined list of the quantifier or light noun.
Conjoined prepositions and prepositional phrases eUD++ - eUD++ eUD++
Propagated governors and dependents eUD (A, B, C) eUD (A, B, C, D) eUD (A, B, C) eUD (A, B, C) 1. This includes: (A) conjoined noun phrases, (B) conjoined adjectival phrases, (C) subjects of conjoined verbs, and (D) objects of conjoined verbs.<br>2. Notice (D) is relevant to be added theoretically but was omitted for practical uncertainty (see 4.2 at the paper).
Subjects of controlled verbs eUD eUD eUD eUD 1. Includes the special case of 'to' with no following verb ("he decided not to").<br>2. Heuristic for choosing the propagated subject (according to coreNLP docu): if the control verb has an object it is propagated as the subject of the controlled verb, otherwise they use the subject of the control verb.
Subjects of controlled verbs - when 'to' marker is missing ? ? - extra 1. Example: "I started reading the book"<br>2. For some reason not included in the coreNLP code, unsure why
Relative pronouns eUD++ eUD (?) eUD++ eUD++
Reduced relative clause - eUD (?) - extra
Subjects of adverbial clauses - - - extra Heuristic for choosing the propagated entity:<br>1. If the marker is "to", the object (if it is animated - but for now we don’t enforce it) of the main clause is propagated as subject, otherwise the subject of the main clause is propagated.<br>2. Else, if the marker is not one of "as/so/when/if" (this includes no marker at all which is mostly equivalent to "while" marker), both the subject and the object of the main clause are equivalent options (unless no object found, then the subject is propagated).
Noun-modifying participles (see [here](https://www.aclweb.org/anthology/W17-6507)) - - extra
Correct possible subject of Noun-modifying participles - - - extra 1. This is a correctness of the subject decision of the previous bullet.<br>2. If the noun being modified is an object/modifier of a verb with some subject, then that subject might be the subject of the Noun-modifying participle as well. (it is uncertain, and seems to be correct only for the more abstract nouns, but that’s just a first impression).
Propagated modifiers (in conjunction constructions) - - - extra Heuristics and assumptions:<br>1. Modifiers that appear after both parts of the conjunction may (the ratio should be researched) refer to both parts. Moreover, If the modifiers father is not the immediate conjunction part, then all the conjunction parts between the father and the modifier are (most probably) modified by the modifier.<br>2. If the modifier father is the immediate conjunction part, we propagate the modifier backward only if the new father, doesn't have any modifiers sons (this is to restrict a bit the amount of false-positives).<br>3. We don’t propagate modifier forwardly (that is, if the conjunct part appears after the modifier, we assume they don’t refer).<br>4. Should be tested for cost/effectiveness as it may bring many false-positives.
Locative and temporal adverbial modifier propagation (indexicals) - - - extra 1. Rational: If a locative or temporal adverbial modifier is stretched away from the verb through a subject/object/modifier(nmod) it should be applied as well to the verb itself.<br>2. Example: "He was running around, in these woods here".
Subject propagation of 'dep' - - - extra Rational: 'dep' is already problematic, as the parser didn't know what relation to assign it. In case the secondary clause doesn't have a subject, most probably it should come from the main clause. It is probably an advcl/conj/parataxis/or so that was missing some marker/cc/punctuation/etc.
Apposition propagation (see [here](https://arxiv.org/pdf/1603.01648.pdf)) - - extra
nmod propagation through subj/obj/nmod - - - extra For now we propagate only modifiers cased by 'like' or 'such_as' prepositions (As they imply reflexivity), and we copy their heads' relation (that is, obj for obj subj for subj and nmod for nmod with its corresponding case).
possessive - - - extra Share possessive modifiers through conjunctions (e.g. My father and mother went home -> My father and (my) mother...
Expanding multi word prepositions - - - extra Add an nmod relation when advmod+nmod is observed while concatinating the advmod and preposition to be the new modifiers preposition (this expands the closed set of eUD's 'Process Multi-word preposition').
Active-passive alteration (see [here](https://www.aclweb.org/anthology/W17-6507)) - - extra Invert subject and object of passive construction (while keeping the old ones).
Copula alteration - - - extra Add a verb placeholder, reconstruct the tree as if the verb was there.
Hyphen alteration - - - extra Add subject and modifier relations to the verb in the middle of an noun-verb adjectival modifing another noun (e.g. a Miami-based company).

Installation

pyBART requires Python 3.7 or later. The preferred way to install pyBART is via pip. Just run pip install pybart-nlp in your Python environment and you're good to go!

pip install pybart-nlp

Usage

Once you've installed pyBART, you can use the package in one of the following ways. Notice that for both methods we placed '...' (three dots) in the API calls, as we provide a list of optinal parameters to configure the conversion process. We will elaborate about it next.

spaCy pipeline component

# Load a UD-based english model
nlp = spacy.load("en_ud_model")

# Add BART converter to spaCy's pipeline
from pybart-nlp.api import Converter
converter = Converter( ... )
nlp.add_pipe(converter, name="BART")

# Test the new converter component
doc = nlp("He saw me while driving")
me_token = doc[2]
for par_tok in me_token._.parent_list:
    print(par_tok)

# Output:
{'head': 2, 'rel':'dobj', 'src':'UD'}
{'head': 5, 'rel': 'nsubj',
  'src':('advcl','while'), 'alt':'0'}

CoNLL-U format

from pybart-nlp.api import convert_bart_conllu

# read a CoNLL-U formatted file
with open(conllu_formatted_file_in) as f:
  sents = f.read()

# convert
converted = convert_bart_conllu(sents, ...)

# use it, probably wanting to write the textual output to a new file
with open(conllu_formatted_file_out, "w") as f:
  f.write(converted)

Configuration

Each of our API calls can get the following optional parameters:

Name Type Default Explanation
enhance_ud boolean True Include Stanford's EnhancedUD conversions.
enhanced_plus_plus boolean True Include Stanford's EnhancedUD++ conversions.
enhanced_extra boolean True Include BART's unique conversions.
conv_iterations int inf Stop the (defaultive) behaivor of iterating on the list of conversions after conv_iterations iterations, though before reaching convergance (that is, no change in graph when conversion-list is applied).
remove_eud_info boolean False Do not include Stanford's EnhancedUD&EnhancedUD++'s extra label information.
remove_extra_info boolean False Do not include BART's extra label information.
remove_node_adding_conversions boolean False Do not include conversions that might add nodes to the given graph.
remove_unc boolean False Do not include conversions that might contain uncertainty (see paper for detailed explanation).
query_mode boolean False Do not include conversions that add arcs rather than reorder arcs.
funcs_to_cancel ConvsCanceler class Empty class instantiation A list of conversions to prevent from occuring by their names. Use get_conversion_names for the full conversion name list

Citing

If you use pyBART or BART in your research, please cite pyBART: Evidence-based Syntactic Transformations for IE.

@inproceedings{Tiktinsky2020pyBARTES,
  title={pyBART: Evidence-based Syntactic Transformations for IE},
  author={Aryeh Tiktinsky and Yoav Goldberg and Reut Tsarfaty},
  year={2020}
}

Team

pyBART is an open-source project backed by the Allen Institute for Artificial Intelligence (AI2), and by Bar-Ilan University as being part of my thesis under the supervision of Yoav Goldberg. AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. Our team consists of Yoav Goldberg, Reut Tsarfaty and myself. Currently we are the contributors to this project but we will be more than happy for anyone who wants to help, via Issues and PR's.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybart-nlp-2.2.3.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

pybart_nlp-2.2.3-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file pybart-nlp-2.2.3.tar.gz.

File metadata

  • Download URL: pybart-nlp-2.2.3.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.3.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for pybart-nlp-2.2.3.tar.gz
Algorithm Hash digest
SHA256 8afd255e17e32629e3a12413595ea5ee914d48419d5ce3d0c58e1c5802e6c6de
MD5 8e04ca057927890d6b2c86334bebccfc
BLAKE2b-256 5c2ec9b0615f568dc028d8b79a8901af79e2030cf0f708bbfbd2c8a1e5cadea3

See more details on using hashes here.

File details

Details for the file pybart_nlp-2.2.3-py3-none-any.whl.

File metadata

  • Download URL: pybart_nlp-2.2.3-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.3.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for pybart_nlp-2.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4f332553be8c712670b43606df6112ab192b5385b75b87acb52cf3992e35f2aa
MD5 0d27c10456c399aace18f3888d939919
BLAKE2b-256 a46b119513ff00ed0f3cdaaf3e6c32869cd2335338a1a970b14a236c72278400

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page