Skip to main content

Converts text corpora into a set of relational facts.

Project description

Python Package Index (PyPi) latest version. License. Master branch build status. Master branch code coverage. Documentation build status and link to documentation.

Relational NLP Preprocessing (rnlp): A Python package and tool for converting text into a set of relational facts.

Installation

Stable builds on PyPi

pip install rnlp

Quick-Start

rnlp can be used either as a command line interface (CLI) tool or as an imported Python Package.

CLI

Imported

$ python -m rnlp -f example_files/doi.txt
Reading corpus from file(s)...
Creating background file...
100%|████████| 18/18 [00:00<00:00, 38it/s]
from rnlp.corpus import declaration
import rnlp

doi = declaration()
rnlp.converter(doi)

The relations created by rnlp include the following:

  • Sentence’s Relative Position in Block:

    • earlySentenceInBlock: Sentence occurs within the first third of a block.

    • midWaySentenceInBlock: Sentence occurs between the first third and the last third of a block’s length.

    • lateSentenceInBlock: Sentence occurs within the last third of a block’s length.

  • Word’s Relative Position in Sentence:

    • earlyWordInSentence: Word occurs within the first third of a sentence.

    • midWayWordInSentence: Word occurs between a third and two-thirds of a sentence.

    • lateWordInSentence: Word occurs within the last third of a sentence.

  • Relative Position Between Items:

    • nextWordInSentence: Pointer from a word to its neighbor.

    • nextSentenceInBlock: Pointer from a sentence to its neighbor.

  • Existential Semantics:

    • sentenceInBlock: Sentence occurs in a particular block.

    • wordInSentence: Word occurs in a particular sentence.

  • Low-Level Information about words:

    • wordString: A string representation of a word.

    • partOfSpeechTag: The word’s part of speech (as determined by the nltk part-of-speech tagger).

Files contain a toy corpus (example files/) and an image of a BoostSRL tree for predicting if a word in a sentence is the word “you”.

https://raw.githubusercontent.com/hayesall/rnlp/master/documentation/img/output.png

The tree says that if the word string contained in word ‘b’ is “you” then ‘b’ is the word “you” with a high probability. (This is of course true). A more interesting inference is the False branch that says that if word ‘b’ is an early word in sentence ‘a’ and word ‘anon12035’ is also an early word in sentence ‘a’ and if the word string contained in word ‘anon12035’ is “Thank”, then the word ‘b’ has decent chance of being the word “you”. (The model was able to learn that the word “you” often occurs with the word “Thank” in the same sentence when “Thank” appears early in that sentence).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

rnlp-0.3.2-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file rnlp-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: rnlp-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for rnlp-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 405ff39fff9ac80eb450c2936c6e5366b884870f72ced5b145e39c810138ef02
MD5 49e42e4508b08dc143dc0b18e93f2d4b
BLAKE2b-256 4bbd6e5a1c9d66c12a6b6b1eda7669fab0e08c49a51e1a69b5244daf6e579947

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page