Skip to main content
Donate to the Python Software Foundation or Purchase a PyCharm License to Benefit the PSF! Donate Now

Converts text corpora into a set of relational facts.

Project description

Relational NLP Preprocessing: A Python package and tool for converting text into a set of relational facts.

https://img.shields.io/pypi/pyversions/rnlp.svg?style=flat-square https://img.shields.io/pypi/v/rnlp.svg?style=flat-square https://img.shields.io/pypi/l/rnlp.svg?style=flat-square https://img.shields.io/readthedocs/rnlp/stable.svg?flat-square

Kaushik Roy (@kkroy36) and Alexander L. Hayes (@batflyer)

Installation

Stable builds on PyPi

pip install rnlp

Development builds on GitHub

pip install git+git://github.com/starling-lab/rnlp.git

Quick-Start

rnlp can be used either as a CLI tool or as an imported Python Package.

CLI Imported
$ python -m rnlp -f files/doi.txt
Reading corpus from file(s)...
Creating background file...
100%|████████| 18/18 [00:00<00:00, 38it/s]
from rnlp.corpus import declaration
import rnlp

doi = declaration()
rnlp.converter(doi)

Text will be converted into relational facts, relations encoded are:

  • between blocks of size ‘n’ (i.e. 2 sentences) in the blocks.
  • between block’s of size n (i.e. ‘n’ sentences) and sentences in the blocks.
  • between sentences and words in the sentences.

The relationships currently encoded are:

  1. earlySentenceInBlock - sentence occurs within a third of the block length
  2. earlyWordInSentence - word occurs within a third of the sentence length
  3. lateSentenceInBlock - sentence occurs after two-thirds of the block length
  4. midWayWordInSentence - word occurs between a third and two-thirds of the block length
  5. nextSentenceInBlock - sentence that follows a sentence in a block
  6. nextWordInSentence - word that follows a word in a sentence in a block
  7. sentenceInBlock - sentence occurs in a block
  8. wordInSentence - word occurs in a sentence.
  9. wordString - the string contained in the word.
  10. partOfSpeech - the part of speech of the word.

Files contain a toy corpus (files/) and an image of a BoostSRL tree for predicting if a word in a sentence is the word “you”.

https://raw.githubusercontent.com/starling-lab/rnlp/master/docs/img/output.png

The tree says that if the word string contained in word ‘b’ is “you” then ‘b’ is the word “you”. (This is of course true). A more interesting inference is the False branch that says that if word ‘b’ is an early word in sentence ‘a’ and word ‘anon12035’ is also an early word in sentence ‘a’ and if the word string contained in word ‘anon12035’ is “Thank”, then the word ‘b’ has decent change of being the word “you”. (The model was able to learn that the word “you” often occurs with the word “Thank” in the same sentence when “Thank” appears early in that sentence).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
rnlp-0.3.1-py2.py3-none-any.whl (33.0 kB) Copy SHA256 hash SHA256 Wheel py2.py3
rnlp-0.3.1.tar.gz (13.5 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page