Skip to main content

NLP tools at TUW Informatics

Project description

tuw-nlp

NLP utilities developed at TUW informatics

Install and Quick Start

Install the tuw-nlp repository from pip:

pip install tuw-nlp

Or install from source:

pip install -e .

On Windows and Mac, you might also need to install Graphviz manually.

You will also need some additional steps to use the library:

Download nltk stopwords:

import nltk
nltk.download('stopwords')

Download stanza models for UD parsing:

import stanza

stanza.download("en")
stanza.download("de")

And then finally download ALTO and tuw_nlp dictionaries:

import tuw_nlp

tuw_nlp.download_alto()
tuw_nlp.download_definitions()

Also please make sure to have JAVA on your system to be able to use the parser!

Then you can parse a sentence as simple as:

from tuw_nlp.grammar.text_to_4lang import TextTo4lang

tfl = TextTo4lang("en", "en_nlp_cache")

fl_graphs = list(tfl("brown dog", depth=1, substitute=False))

# Then the fl_graphs will directly contain a networkx graph object
fl_graphs[0].nodes(data=True)

For more examples you can check the jupyter notebook under notebooks/experiment

Services

We also provide services built on our package. To get to know more visit services.

Text_to_4lang service

To run a browser-based demo (also available online) for building graphs from raw texts, first start the graph building service:

python services/text_to_4lang/backend/service.py

Then run the frontend with this command:

streamlit run services/text_to_4lang/frontend/extract.py

In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as expand, substitute and append_zero_paths.

Modules

text

General text processing utilities, contains:

  • segmentation: stanza-based processors for word and sentence level segmentation
  • patterns: various patterns for text processing tasks

graph

Tools for working with graphs, contains:

  • utils: misc utilities for working with graphs

grammar

Tools for generating and using grammars, contains:

  • alto: tools for interfacing with the alto tool
  • irtg: class for representing Interpreted Regular Tree Grammars
  • lexicon: Rule lexica for building lexicalized grammars
  • ud_fl: grammar-based mapping of Universal Dependencies to 4lang semantic graphs.
  • utils: misc utilities for working with grammars

Contributing

We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at

Citing

License

MIT license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tuw-nlp-0.0.1.1.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tuw_nlp-0.0.1.1-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file tuw-nlp-0.0.1.1.tar.gz.

File metadata

  • Download URL: tuw-nlp-0.0.1.1.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for tuw-nlp-0.0.1.1.tar.gz
Algorithm Hash digest
SHA256 b9c9538d3998f8b3200d5b2bd5cda427db098125f23057e43e6ea175ace45694
MD5 2216079b1440a946d153c663b2388ed9
BLAKE2b-256 d2552c773f764f7e2a037503f5e69b53777f0e61fff86a7ffa9563caa446a0b3

See more details on using hashes here.

File details

Details for the file tuw_nlp-0.0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tuw_nlp-0.0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 43.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for tuw_nlp-0.0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 02882bc6a32bccc41986c69845031f498e924eaf3388dfc01acde2fb499f94cb
MD5 6a3072c42b55c3b23eb032f9b62e9742
BLAKE2b-256 afa916dc3e15c2dbcc754f148fb5b92c847cf78851a6086843e6fb6f910f47df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page