Skip to main content

NLP tools at TUW Informatics

Project description

TUW-NLP

NLP utilities developed at TUW informatics.

Install and Quick Start

Install the tuw-nlp repository from pip:

pip install tuw-nlp

Or install from source:

pip install -e .

On Windows and Mac, you might also need to install Graphviz manually.

You will also need some additional steps to use the library:

Download nltk stopwords:

import nltk
nltk.download('stopwords')

Download stanza models for UD parsing:

import stanza

stanza.download("en")
stanza.download("de")

And then finally download ALTO and tuw_nlp dictionaries:

import tuw_nlp

tuw_nlp.download_alto()
tuw_nlp.download_definitions()

Also please make sure to have JAVA on your system to be able to use the parser!

Then you can parse a sentence as simple as:

from tuw_nlp.grammar.text_to_4lang import TextTo4lang

tfl = TextTo4lang("en", "en_nlp_cache")

fl_graphs = list(tfl("brown dog", depth=1, substitute=False))

# Then the fl_graphs will directly contain a networkx graph object
fl_graphs[0].nodes(data=True)

For more examples you can check the jupyter notebook under notebooks/experiment

Services

We also provide services built on our package. To get to know more visit services.

Text_to_4lang service

To run a browser-based demo (also available online) for building graphs from raw texts, first start the graph building service:

python services/text_to_4lang/backend/service.py

Then run the frontend with this command:

streamlit run services/text_to_4lang/frontend/extract.py

In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as expand, substitute and append_zero_paths.

Modules

text

General text processing utilities, contains:

  • segmentation: stanza-based processors for word and sentence level segmentation
  • patterns: various patterns for text processing tasks

graph

Tools for working with graphs, contains:

  • utils: misc utilities for working with graphs

grammar

Tools for generating and using grammars, contains:

  • alto: tools for interfacing with the alto tool
  • irtg: class for representing Interpreted Regular Tree Grammars
  • lexicon: Rule lexica for building lexicalized grammars
  • ud_fl: grammar-based mapping of Universal Dependencies to 4lang semantic graphs.
  • utils: misc utilities for working with grammars

Contributing

We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at

Citing

If you use the library, please cite our paper

@inproceedings{Recski:2021,
  title={Explainable Rule Extraction via Semantic Graphs},
  author={Recski, Gabor and Lellmann, Bj{\"o}rn and Kovacs, Adam and Hanbury, Allan},
  booktitle = {{Proceedings of the Fifth Workshop on Automated Semantic Analysis
of Information in Legal Text (ASAIL 2021)}},
  publisher = {{CEUR Workshop Proceedings}},
  address = {São Paulo, Brazil},
  pages="24--35",
  url= "http://ceur-ws.org/Vol-2888/paper3.pdf",
  year={2021}
}

License

MIT license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tuw-nlp-0.0.4.tar.gz (85.6 kB view hashes)

Uploaded Source

Built Distribution

tuw_nlp-0.0.4-py3-none-any.whl (85.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page