NLP tools at TUW Informatics
Project description
tuw-nlp
NLP utilities developed at TUW informatics
Install and Quick Start
Install the tuw-nlp repository from pip:
pip install tuw-nlp
Or install from source:
pip install -e .
On Windows and Mac, you might also need to install Graphviz manually.
You will also need some additional steps to use the library:
Download nltk stopwords:
import nltk
nltk.download('stopwords')
Download stanza models for UD parsing:
import stanza
stanza.download("en")
stanza.download("de")
And then finally download ALTO and tuw_nlp dictionaries:
import tuw_nlp
tuw_nlp.download_alto()
tuw_nlp.download_definitions()
Also please make sure to have JAVA on your system to be able to use the parser!
Then you can parse a sentence as simple as:
from tuw_nlp.grammar.text_to_4lang import TextTo4lang
tfl = TextTo4lang("en", "en_nlp_cache")
fl_graphs = list(tfl("brown dog", depth=1, substitute=False))
# Then the fl_graphs will directly contain a networkx graph object
fl_graphs[0].nodes(data=True)
For more examples you can check the jupyter notebook under notebooks/experiment
Services
We also provide services built on our package. To get to know more visit services.
Text_to_4lang service
To run a browser-based demo (also available online) for building graphs from raw texts, first start the graph building service:
python services/text_to_4lang/backend/service.py
Then run the frontend with this command:
streamlit run services/text_to_4lang/frontend/extract.py
In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as expand, substitute and append_zero_paths.
Modules
text
General text processing utilities, contains:
- segmentation: stanza-based processors for word and sentence level segmentation
- patterns: various patterns for text processing tasks
graph
Tools for working with graphs, contains:
- utils: misc utilities for working with graphs
grammar
Tools for generating and using grammars, contains:
- alto: tools for interfacing with the alto tool
- irtg: class for representing Interpreted Regular Tree Grammars
- lexicon: Rule lexica for building lexicalized grammars
- ud_fl: grammar-based mapping of Universal Dependencies to 4lang semantic graphs.
- utils: misc utilities for working with grammars
Contributing
We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at firstname.lastname@tuwien.ac.at
Citing
License
MIT license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tuw-nlp-0.0.1.1.tar.gz.
File metadata
- Download URL: tuw-nlp-0.0.1.1.tar.gz
- Upload date:
- Size: 55.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9c9538d3998f8b3200d5b2bd5cda427db098125f23057e43e6ea175ace45694
|
|
| MD5 |
2216079b1440a946d153c663b2388ed9
|
|
| BLAKE2b-256 |
d2552c773f764f7e2a037503f5e69b53777f0e61fff86a7ffa9563caa446a0b3
|
File details
Details for the file tuw_nlp-0.0.1.1-py3-none-any.whl.
File metadata
- Download URL: tuw_nlp-0.0.1.1-py3-none-any.whl
- Upload date:
- Size: 43.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02882bc6a32bccc41986c69845031f498e924eaf3388dfc01acde2fb499f94cb
|
|
| MD5 |
6a3072c42b55c3b23eb032f9b62e9742
|
|
| BLAKE2b-256 |
afa916dc3e15c2dbcc754f148fb5b92c847cf78851a6086843e6fb6f910f47df
|