Skip to main content

The Python JSON-NLP package

Project description

Python JSON-NLP Module

(C) 2019 by Damir Cavar, Oren Baldinger, Maanvitha Gongalla, Anurag Kumar, Murali Kammili

Brought to you by the NLP-Lab.org!

Introduction

There is a growing number of Natural Language Processing (NLP) tools, modules, pipelines. There does not seem to be any standard for the output format. Here we are focusing on a standard for the output format syntax. Some future version of JSON-NLP might address the output semantics as well.

JSON-NLP is a standard for the most important outputs NLP pipelines and components can generate. The relevant documentation can be found in the JSON-NLP GitHub repo and on its website at the [NLP-Lab].

The Python JSON-NLP module contains general mapping functions for JSON-NLP to CoNLL-U, a validator for the generated output, an NLP pipeline interface (for Flair, spaCy, NLTK, Polyglot, Xrenner, etc.), and various utility functions.

There is a Java JSON-NLP Maven module as well, and there are wrappers for numerous popular NLP pipelines and tools linked from the NLP-Lab.org website.

Installation

For more details, see JSON-NLP.

This module is a wrapper for outputs from different NLP pipelines and modules into a standardized JSON-NLP format.

To install this package, run the following command:

pip install pyjsonnlp

You might have to use pip3 on some systems.

Validation

JSON-NLP is based on a schema, built by NLP-Lab.org, to comprehensively and concisely represent linguistic annotations. We provide a validator to help ensure that generated JSON validates against the schema:

result = MyPipeline().proces(text="I am a sentence")
assert pyjsonnlp.validation.is_valid(result)

Conversion

To enable interoperability with other annotation formats, we support conversions between them. Note that conversion could be lossy, if the relative depths of annotation are not the same. Currently we have a CoNLL-U to JSON-NLP converter, that covers most annotations:

pyjsonnlp.conversion.parse_conllu(conllu_text)

To convert the other direction:

pyjsonnlp.conversion.to_conllu(jsonnlp)

Pipeline

JSON-NLP provides a simple Pipeline interface that should be implemented for embedding into a microservice:

from collections import OrderedDict

class MockPipeline(pyjsonnlp.pipeline.Pipeline):
    @staticmethod
    def process(text='', coreferences=False, constituents=False, dependencies=False, expressions=False,
                **kwargs) -> OrderedDict: 
        return OrderedDict()

The provided keyword arguments should be used to toggle on or off processing components within the method.

If you have deployed a Pipeline as a microservice (see below), we provide a local endpoint for a remotely deployed Pipeline via the RemotePipeline class:

pipeline = pyjsonnlp.pipeline.RemotePipeline('localhost', port=9000)
print(pipeline.process(text='I am a sentence', dependencies=True, something='else'), spacing=2)

Microservice

The next step is the JSON-NLP a Microservice class, with a pre-built implementation of [Flask].

from pyjsonnlp.microservices.flask_server import FlaskMicroservice

app = FlaskMicroservice(__name__, MyPipeline(), base_route='/')

We recommend creating a server.py with the FlaskMicroservice class, which extends the [Flask] app. A corresponding WSGI file would contain:

from mypipeline.server import app as application

To disable a pipeline component (such as phrase structure parsing), add

application.constituents = False

The full list of properties available that can be disabled or enabled are

  • constituents
  • dependencies
  • coreference
  • expressions

The microservice exposes the following URIs:

  • /constituents
  • /dependencies
  • /coreference
  • /expressions
  • /token_list

These URIs are shortcuts to disable the other components of the parse. In all cases, tokenList will be included in the JSON-NLP output. An example url is:

http://localhost:5000/dependencies?text=I am a sentence

Text is provided to the microservice with the text parameter, via either GET or POST. If you pass url as a parameter, the microservice will scrape that url and process the text of the website.

Other parameters specific to your pipeline implementation can be passed as well:

http://localhost:5000?lang=en&constituents=0&text=I am a sentence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyjsonnlp-0.2.33.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

pyjsonnlp-0.2.33-py3-none-any.whl (52.1 kB view details)

Uploaded Python 3

File details

Details for the file pyjsonnlp-0.2.33.tar.gz.

File metadata

  • Download URL: pyjsonnlp-0.2.33.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for pyjsonnlp-0.2.33.tar.gz
Algorithm Hash digest
SHA256 f9b6af4fcf39be3807e6aaa87f2df94d4bec1b2f955c84d0d0e60bbb41920fed
MD5 03e1f0be2e9edd781928e7950359d39c
BLAKE2b-256 b1e168cf474ec5c5fca34538f461324106c660ca78a630c6fbbbc8cf2097d7bd

See more details on using hashes here.

File details

Details for the file pyjsonnlp-0.2.33-py3-none-any.whl.

File metadata

  • Download URL: pyjsonnlp-0.2.33-py3-none-any.whl
  • Upload date:
  • Size: 52.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for pyjsonnlp-0.2.33-py3-none-any.whl
Algorithm Hash digest
SHA256 29933d8cfdd793315b66462445936edadf42b81f4e5bff2fd9b8099e8b2ebba6
MD5 f392c1bf2181981fc28ea81ac64f72be
BLAKE2b-256 04b102e08ee275c257379b1c52622abde9e0c2b35b12a3489523a42dd3b69b14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page