Skip to main content

python converter from UD-tree to BART-graph representations

Project description

CI


A Python converter from Universal-Dependencies trees to BART representation.
Try out our UD-BART comparison Demo



BART (Bar-Ilan & AI2 Representation Transformation) is our new and cool enhanced-syntatic-representation specialized to improve Relation Extraction, but suitable for any NLP down-stream task.

See our pyBART: Evidence-based Syntactic Transformations for IE for detailed description of BART's creation/linguisical-verification/evaluation processes, and list of conversions.

This project is part of a wider project series, related to BART:

  1. Converter: The current project.
  2. Model: UD based spaCy model (pip install the_large_model). This model is needed when using the converter as a spaCy pipeline component (as spaCy doesn't provide UD-format based models).
  3. Demo: Web-demo making use of the converter, to compare between UD and BART representations.

Table of contents

Table of contents generated with markdown-toc

Converter description

  • Converts UD (supports both versions 1 and 2) to BART.
  • Supports Conll-U format, spaCy docs, and spaCy pipeline component (see Usage).
  • Highly configurable (see Configuration).

Note: The BART representation subsumes Stanford's EnhancedUD conversions, these conversions are described here, and were already implemented by core-NLP Java converter. As such they were not avialable to python users and thus we have ported them to pyBART and tried to maintain their behavior as much as reasonable.

Installation

pyBART requires Python 3.7 or later (yes including up to 3.11). The preferred way to install pyBART is via pip. Just run pip install pybart-nlp in your Python environment and you're good to go! If you want to use pyBART as a spaCy pipeline component, then you should install as well: (1) the spaCy package and (2) a spaCy-model based on UD-format (which we happen to provide (details are here)

# if you want to use pyBART as a spaCy pipeline component, well,
#   you need spaCy installed and a transformer-based spaCy model (based on UD-format):
pip install spacy
pip install https://storage.googleapis.com/en_ud_model/en_ud_model_trf-2.0.0.tar.gz

# or if you want non-trandformer-based smaller models:
#   large: https://storage.googleapis.com/en_ud_model/en_ud_model_lg-2.0.0.tar.gz
#   medium: https://storage.googleapis.com/en_ud_model/en_ud_model_md-2.0.0.tar.gz
#   small: https://storage.googleapis.com/en_ud_model/en_ud_model_sm-2.0.0.tar.gz

# and this is us. please don't confuse with pybart/bart-py/bart
pip install pybart-nlp

Usage

Once you've installed pyBART, you can use the package in one of the following ways. Notice, in the spacy mode we assign a method in the doc context named "get_pybart" which returns a list of lists. Each list corresponds to a sentence in doc.sents, and contains a list of edge dictionaries. Each edge contains the following fields: "head", "tail", and "label". "head" and "tail" can be either a reference to the corresponding spacy Token or a string representing an added node (and as such can't have a spacy Token reference).
Notice that for both methods the API calls can be called with a list of optional parameters to configure the conversion process. We will elaborate about them next.

spaCy pipeline component

import spacy
from pybart.api import *

# Load a UD-based english model
nlp = spacy.load("en_ud_model_sm") # here you can change it to md/sm/lg as you preffer

# Add BART converter to spaCy's pipeline
nlp.add_pipe("pybart_spacy_pipe", last="True", config={'remove_extra_info':True}) # you can pass an empty config for default behavior, this is just an example

# Test the new converter component
doc = nlp("He saw me while driving")
for i, sent in enumerate(doc._.get_pybart()):
    print(f"Sentence {i}")
    for edge in sent:
        print(f"{edge['head']} --{edge['label']}--> {edge['tail']}")

# Output:
# Sentence 0:
# saw --root--> saw
# saw --nsubj--> He
# saw --dobj--> me
# saw --advcl:while--> driving
# driving --mark--> while
# driving --nsubj--> He

CoNLL-U format

from pybart.api import convert_bart_conllu

# read a CoNLL-U formatted file
with open(conllu_formatted_file_in) as f:
  sents = f.read()

# convert
converted = convert_bart_conllu(sents)

# use it, probably wanting to write the textual output to a new file
with open(conllu_formatted_file_out, "w") as f:
  f.write(converted)

Configuration

Each of our API calls can get the following optional parameters:

Name Type Default Explanation
enhance_ud boolean True Include Stanford's EnhancedUD conversions.
enhanced_plus_plus boolean True Include Stanford's EnhancedUD++ conversions.
enhanced_extra boolean True Include BART's unique conversions.
conv_iterations int inf Stop the (defaultive) behaivor of iterating on the list of conversions after conv_iterations iterations, though before reaching convergance (that is, no change in graph when conversion-list is applied).
remove_eud_info boolean False Do not include Stanford's EnhancedUD&EnhancedUD++'s extra label information.
remove_extra_info boolean False Do not include BART's extra label information.
remove_node_adding_conversions boolean False Do not include conversions that might add nodes to the given graph.
remove_unc boolean False Do not include conversions that might contain uncertainty (see paper for detailed explanation).
query_mode boolean False Do not include conversions that add arcs rather than reorder arcs.
funcs_to_cancel List[str] None A list of conversions to prevent from occuring by their names. Use get_conversion_names for the full conversion name list
ud_version int 1 Which UD version to expect as input and to set the converter to. Currently we support 1 and 2.

Citing

If you use pyBART or BART in your research, please cite pyBART: Evidence-based Syntactic Transformations for IE.

@inproceedings{Tiktinsky2020pyBARTES,
  title={pyBART: Evidence-based Syntactic Transformations for IE},
  author={Aryeh Tiktinsky and Yoav Goldberg and Reut Tsarfaty},
  booktitle={ACL},
  year={2020}
}

Team

pyBART is an open-source project backed by the Allen Institute for Artificial Intelligence (AI2), and by Bar-Ilan University as being part of my thesis under the supervision of Yoav Goldberg. AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. Our team consists of Yoav Goldberg, Reut Tsarfaty and myself. Currently we are the contributors to this project but we will be more than happy for anyone who wants to help, via Issues and PR's.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybart-nlp-3.4.1.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

pybart_nlp-3.4.1-py3-none-any.whl (43.4 kB view details)

Uploaded Python 3

File details

Details for the file pybart-nlp-3.4.1.tar.gz.

File metadata

  • Download URL: pybart-nlp-3.4.1.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for pybart-nlp-3.4.1.tar.gz
Algorithm Hash digest
SHA256 4774d80c53ac3c2fac8cf8dd2a96a9386b96cb47e78b760c32b6daacf1769331
MD5 d73de89d1b1283a6f26b06aba6c84918
BLAKE2b-256 d7d9bd2759648f4ffd44a9c4b95658d0e34c133478c01bd39ec89f684393f7e7

See more details on using hashes here.

File details

Details for the file pybart_nlp-3.4.1-py3-none-any.whl.

File metadata

  • Download URL: pybart_nlp-3.4.1-py3-none-any.whl
  • Upload date:
  • Size: 43.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for pybart_nlp-3.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ade0229ac28d6dda617ed4c26c0a3b67cf0588f81e9f5c3f1ba027dce2ff095f
MD5 fde4fa702fe2744b1e90d74fb5184d9b
BLAKE2b-256 cbb00d49108aedea9a2a63d88f9699d440952a2f4f6a134d86af4bc4ab474337

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page