spanner nlp

Reason this release was yanked:

bad setup

Project description

Stanford CoreNLP Python Wrapper

Python wrapper for Stanford CoreNLP that interfaces with the Stanford CoreNLP server. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, as described in the Full List Of Annotators.

Prerequisites

Java 1.8+ (Download Page). You can check java version with the command: java -version.
Python 3.6+ (Download Page). You can check python version with the command: python --version.
Stanford CoreNLP files version 4.1.0 (Download Page).

Usage

Annotators wrapper - Simple Usage - Using local files

This example will demonstrate how to use the annotators wrapper using the local files downloded from Stanford CoreNLP.
All the annotators and their information can be found in Stanford CoreNLP Full List Of Annotators.

from StanfordCoreNLP import StanfordCoreNLP

with StanfordCoreNLP('stanford-corenlp-4.1.0') as nlp:
    print('Tokenize:', nlp.tokenize("Hello world. Hello world again."))
    print('Sentence Splitting:', nlp.ssplit("Hello world. Hello world again."))
    print('Part of Speech:', nlp.pos("Marie was born in Paris."))

Example output Tokenize:

Tokenize: [
    {
        "token": "Hello",
        "span": [
            0,
            5
        ]
    },
    {
        "token": "world",
        "span": [
            6,
            11
        ]
    },
    ...

Example output Sentence Splitting:

Sentence Splitting: [
    "Hello world.",
    "Hello world again."
]

Example output Part of Speech:

Part of Speech: [
    {
        "token": "Marie",
        "pos": "NNP",
        "span": [
            0,
            5
        ]
    },
    {
        "token": "was",
        "pos": "VBD",
        "span": [
            6,
            9
        ]
    },
    ...

Manual Annotators

The examples below will demonstrate how to define annotators Manualy using local files or using existing server.

Properties for using manual annotators:

annotators: Full List Of Annotators.
pinelineLanguage: Full List Of Human Languages.
outputFormat: JSON, XML, Text, Serialized.

Manual Annotators - Using local files

from StanfordCoreNLP import StanfordCoreNLP

nlp = StanfordCoreNLP('stanford-corenlp-4.1.0')
text = 'The small red car turned very quickly around the corner.'
pros = {'annotators' : 'ner', 'pinelineLanguage' : 'en', 'outputFormat' : 'xml'} #Named Entity Recognition example
print(nlp.annotate(text, properties = pros))
nlp.close()

Example output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="CoreNLP-to-HTML.xsl" type="text/xsl"?>
<root>
  <document>
    <sentences>
      <sentence id="1">
        <tokens>
          <token id="1">
            <word>The</word>
            <lemma>the</lemma>
            <CharacterOffsetBegin>0</CharacterOffsetBegin>
            <CharacterOffsetEnd>3</CharacterOffsetEnd>
            <POS>DT</POS>
            <NER>O</NER>
          </token>
          <token id="2">
           ...

Manual Annotators - Using existing server

from StanfordCoreNLP import StanfordCoreNLP

nlp = StanfordCoreNLP('http://corenlp.run', port = 80)
text = 'Joe Smith lives in California. He used to live in Oregon.'
pros = {'annotators' : 'lemma', 'pinelineLanguage' : 'en', 'outputFormat' : 'JSON'} #Lemmatization example
print(nlp.annotate(text, properties = pros))
nlp.close()

Example output:

{
  "sentences": [
    {
      "index": 0,
      "tokens": [
        {
          "index": 1,
          "word": "Joe",
          "originalText": "Joe",
          "lemma": "Joe",
          "characterOffsetBegin": 0,
          "characterOffsetEnd": 3,
          "pos": "NNP",
          "before": "",
          "after": " "
        },
        {
          "index": 2,
           ...

Manual Annotators - Support a number of annotators at the same time - Using local files

Note: This example also support using existing server.

from StanfordCoreNLP import StanfordCoreNLP

nlp = StanfordCoreNLP('stanford-corenlp-4.1.0', lang = 'en')
text = 'Joe Smith lives in California. He used to live in Oregon.'
pros = {'annotators' : 'tokenize, ssplit, pos', 'pinelineLanguage' : 'en', 'outputFormat' : 'JSON'}
print(nlp.annotate(text, pros, True))
nlp.close()

Example output:

{
    "tokenize": [
        {
            "token": "Joe",
            "span": [
                0,
                3
            ]
        },
        {
            "token": "Smith",
            "span": [
                4,
                9
            ]
        },
        {
            "token": "lives",
            "span": [
                10,
                15
            ]
        },
        {
            "token": "in",
            "span": [
                16,
                18
            ]
        },
        {
            "token": "California",
            "span": [
                19,
                29
            ]
        },
        ...

Debug

You can debug using the logging module in python. This example will demonstrate how to use the logging module:

from StanfordCoreNLP import StanfordCoreNLP
import logging

nlp = StanfordCoreNLP('stanford-corenlp-4.1.0', quiet = False, loggingLevel = logging.DEBUG)
text = 'The small red car turned very quickly around the corner.'
print(nlp.annotate(text)) #default annotate
nlp.close()

Project details

Release history Release notifications | RSS feed

0.0.6

Sep 16, 2021

0.0.5

Sep 4, 2021

0.0.4

May 29, 2021

0.0.3 yanked

May 19, 2021

Reason this release was yanked:

bad setup

This version

0.0.2 yanked

May 19, 2021

Reason this release was yanked:

bad setup

0.0.1 yanked

May 19, 2021

Reason this release was yanked:

bad setup

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spanner-nlp-0.0.2.tar.gz (3.8 kB view hashes)

Uploaded May 19, 2021 Source

Hashes for spanner-nlp-0.0.2.tar.gz

Hashes for spanner-nlp-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7593248052a532bd2c82381184b3285e23c46dcbb50cbf228061f611ed97cd29`
MD5	`30f62f8dcc7295cb7c7c2b01fe825af8`
BLAKE2b-256	`959d8f7d4ad39d1676beb58dfba282f025f5b54733087a610d4e9d1bb2aee46e`