The Python spaCy JSON-NLP package
Project description
spaCy to JSON-NLP
(C) 2019 by Damir Cavar, Oren Baldinger, Maanvitha Gongalla, Anurag Kumar, Murali Kammili
Brought to you by the NLP-Lab.org!
Introduction
Currently this module requires Python 3.6+.
This module provides a spaCy v2.1 wrapper for JSON-NLP. It takes the spaCy output and generates a JSON-NLP output. It also provides a Microservice wrapper that allows you to launch the spaCy module as a persistent RESTful service using Flask or other WSGI-based server.
Since this microservice is built on spaCy, you will need to have its models download, for example:
python -m spacy download en
python -m spacy download en_core_web_md
Additional Pipeline Modules
spaCy allows for the addition of additional models as pipeline modules. We provide such integrations for coreference and phrase structure trees.
Anaphora and Coreference Resolution
We provide HuggingFace coreference resolution, a fast system tightly integrated into spaCy. Note that the first time the parser is run, it will download the coreference models if they are not already present. These models only work for English.
Phrase Structure Trees (Constituency Parse)
We provide the CPU version of the benepar parser, a highly accurate phrase structure parser. Bear in mind it is a Tensorflow module, as such it has a notable start-up time, and relatively high memory requirements (4GB+).
If you have a GPU available, you can install the GPU version of the module with:
pip install --upgrade benepar[gpu]
Microservice
The JSON-NLP repository provides a Microservice class, with a pre-built implementation of Flask. To run it, execute:
python spacyjsonnlp/server.py
Since server.py
extends the Flask app, a WSGI file would contain:
from spacyjsonnlp.server import app as application
To disable a pipeline component (such as phrase structure parsing), add
application.constituents = False
The full list of properties that can be disabled or enabled are
- constituents
- dependencies
- coreference
- expressions
The microservice exposes the following URIs:
- /constituents
- /dependencies
- /coreference
- /expressions
- /token_list
These URIs are shortcuts to disable the other components of the parse. In all cases, tokenList
will be included in the JSON-NLP
output. An example url is:
http://localhost:5000/dependencies?text=I am a sentence
Text is provided to the microservice with the text
parameter, via either GET
or POST
. If you pass url
as a parameter, the microservice will scrape that url and process the text of the website.
The spaCy language model to use for parsing can be selected with the spacy_model
parameter.
Here is an example GET
call:
http://localhost:5000?spacy_model=en&constituents=0&text=I am a sentence.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spacyjsonnlp-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f303c0893f1409eaec0fd72bbbaac90ece0f915db3b5cc9f401929c31e30ac34 |
|
MD5 | db5e21f804e475f2bbbed7abf98152e5 |
|
BLAKE2b-256 | 79f39d3df9ec48ed4013943f09bc532aa0402caf3f07e7097a084ca5764a7d9c |