pyConTextNLP

A Python implementation of the ConText algorithm

These details have not been verified by PyPI

Project links

Homepage

Project description

# pyConTextNLP

pyConTextNLP is a Python implementation/extension/modification of the ConText algorithm described in [CITE]() which is itself a generalization of the NegEx algorithm described in [CITE]().

The package is maintained by Brian Chapman at the University of Utah. Other active and past developers include:

* Wendy W. Chapman
* Glenn Dayton

## Introduction

pyConTextNLP is a partial implementation of the ConText algorithm using Python. The original description of pyConTextNLP was provided in Chapman BE, Lee S, Kang HP, Chapman WW, "Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm." [J Biomed Inform. 2011 Oct;44(5):728-37](http://www.sciencedirect.com/science/article/pii/S1532046411000621)

Other publications/presentations based on pyConText include:
* Wilson RA, et al. "Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports." J Pathol Inform. 2010 Oct 11;1:24.
* Chapman BE, Lee S, Kang HP, Chapman WW. Using ConText to Identify Candidate Pulmonary Embolism Subjects Based on Dictated Radiology Reports. (Presented at AMIA Clinical Research Informatics Summit 2011)
* Wilson RA, Chapman WW, DeFries SJ, Becich MJ, Chapman BE. Identifying History of Ancillary Cancers in Mesothelioma Patients from Free-Text Clinical Reports. (Presented at AMIA 2010).

Note: we changed the package name from pyConText to pyConTextNLP because of a name conflict on pypi.

## Installation

pyConTextNLP can be downloaded from the Downloads page here on the negex Google Code project. Alternatively, it can be downloaded from the pypi repository http://pypi.python.org/pypi/pyConTextNLP. Since pyConTextNLP is registered with pypi, it can be installed with easy_install or pip:

easy_install pyConTextNLP
pip install pyConTextNLP

The only listed dependency is NetworkX and easy_install should also install this for you, if it is not already installed. However, there is optional functionality that is dependent on pygraphviz. I do not yet have this worked into the setuptools script.

## Code Structure

The original code used in the JBI is in the top level pyConTextNLP package. A simplification of this original algorithm that uses [http://networkx.lanl.gov/ NetworkX] is in the subpackage pyConTextNLP.pyConTextGraph. pyConTextGraph is what is currently being developed by us and is what is described here.

The package has three files:

* *itemData.py*. This is where the essential domain knowledge is stored in 4-tuples as described in the paper. For a new application, this is where the user will encapsulate the domain knowledge for their application.
* *pyConTextGraph.py*. This module defines the algorithm
* *pyConTextSql.py*.

## How to Use

I am working on improving the documentation and (hopefully) adding some testing to the code.

Some preliminary comments:

* pyConTextNLP works marks up text on a sentence by sentence level.
* pyConTextNLP facilitates reasoning from multi-sentence documents, but the markup (e.g. negation is all limited within the scope of a sentence.
* pyConTextNLP assumes the sentence is a string not a list of words

### The Skeleton of an Example

To illustrate how to use pyConTextNLP, i've taken some code excerpts from a simple application that was written to identify critical finders in radiology reports.

The first step in building an application is to define _itemData_ objects for your problem. The package contains _itemData_ objects defined in pyConTextNLP.pyConTextGraph.itemData. Common negation terms, conjunctions, pseudo-negations, etc. are defined in here. An itemData instance consists of a 4-tuple. Here is an excerpt

~~~~~

probableNegations = itemData(
["can rule out","PROBABLE_NEGATED_EXISTENCE","","forward"],
["cannot be excluded","PROBABLE_NEGATED_EXISTENCE",r"""cannot\sbe\s((entirely|completely)\s)?(excluded|ruled out)""","backward"])
~~~~~~

The four parts are
1. The _literal_ "can rule out", "cannot be excluded"
2. The _Category_ "PROBABLE_NEGATED_EXISTENCE"
3. An optional regular expression used to capture the literal in the text. If no regular expression is provided, a regular expression is generated literally from the literal.
4. An optional rule. If the itemData is being used as a modifier, the rule states what direction the modifier operates in the sentence: current valid values are: "forward", the item can modify objects following it in the sentence; "backward", the item can modify objects preceding it in the sentence; or "bidirectional", the item can modify objects preceding and following it in the sentence.

For the criticalFinderGraph.py application, we defined _itemData_ for the critical findings we wanted to identify in the text, for example pulmonary emboli and aortic dissections. These new _itemData_ objects were defined in a file named critfindingItemData.py

~~~~~
critItems = itemData(
['pulmonary embolism','PULMONARY_EMBOLISM',r'''pulmonary\s(artery )?(embol[a-z]+)''',''],
['pe','PULMONARY_EMBOLISM',r'''\bpe\b''',''],
['embolism','PULMONARY_EMBOLISM',r'''\b(emboli|embolism|embolus)\b''',''],
['aortic dissection','AORTIC_DISSECTION','',''])
~~~~~~

We also added negation terms that were not originally defined in pyConTextNLP:

~~~~
definiteNegations.prepend([["nor","DEFINITE_NEGATED_EXISTENCE","","forward"],])
~~~~~

Once we have all our _itemData_ defined, we're now ready to start processing text.

In our application we need to import the relevant modules from pyConTextNLP as well as our own _itemData_ definitions:

~~~~
import pyConTextNLP.pyConTextGraph.pyConTextGraph as pyConText
import pyConText.helpers as helpers
from critfindingItemData import *
~~~~~

Assuming we have read in our documents to process and that the basic document unit is a _report_ we can write a simple function to process the report

~~~~~
def analyzeReport(report, targets, modifiers ):
"""given an individual radiology report, markup the report based on targets and modifiers"""
# create the pyConText instance
context = pyConText.pyConText()

# split the report into individual sentences. Note this is a very simple sentence splitter. You probably
# want to write your own or use a sentence splitter from nltk or the like.
sentences = helpers.sentenceSplitter(report)

# process each sentence in the report
for s in sentences:
context.setTxt(s)
context.markItems(modifiers, mode="modifier")
context.markItems(targets, mode="target")

# some itemData are subsets of larger itemData instances. At the point they will have all been
# marked. Drop any marked targets and modifiers that are a proper subset of another marked
# target or modifier
context.pruneMarks()

# drop any marks that have the CATEGORY "Exclusion"; these are phrases we want to ignore.
context.dropMarks('Exclusion')

# match modifiers to targets
context.applyModifiers()

# Drop any modifiers that didn't get hooked up with a target
context.dropInactiveModifiers()

# put the current markup into an "archive". The archive will later be used to reason across the entire report.

return context
~~~~~~

The markup is stored as a directed graph, so determining whether a target is, for example, negated, you simply check to see if an immediate predecessor of the target node is a negation. This is all done with NetworkX commands.

To access the underlying graph from the context object evoke the getCurrentGraph() method

~~~~
g = context.getCurrentGraph()
~~~~

Here is some code to get a list of all the target nodes in the markup:

~~~~
targets = [n[0] for n in g.nodes(data = True) if n[1].get("category","") == 'target']
~~~~~

Here is a function to test whether a node is modified by any of the categories in a list

~~~~~

def modifies(g,n,modifiers):
"""g: directed graph representing the ConText markup
n: a node in g
modifiers: a list of categories e.g. ["definite_negated_existence","probable_existence"]
modifies() tests whether n is modified by an objects with category in categories"""
pred = g.predecessors(n)
if( not pred ):
return False
pcats = [n.getCategory().lower() for n in pred]
return bool(set(pcats).intersection([m.lower() for m in modifiers]))
~~~~~~

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.0.1

Feb 4, 2019

0.7.0.0

Jan 9, 2019

0.6.2.0

Oct 2, 2017

0.6.1.4

Apr 13, 2017

0.6.1.3

Apr 13, 2017

0.6.1.2

Oct 14, 2016

0.6.1.1

Jul 23, 2016

0.6.1.0

May 8, 2016

0.6.0.7

Jan 12, 2016

0.6.0.5

Oct 20, 2015

0.6.0.4

Oct 20, 2015

0.6.0.2

Jul 9, 2015

This version

0.6.0.0

Apr 24, 2015

0.5.1.9

Jun 27, 2013

0.5.1.8

Jun 15, 2013

0.5.1.7

Jun 12, 2013

0.5.1.6

Jun 11, 2013

0.5.1.5

May 6, 2013

0.5.1.4

Apr 18, 2013

0.5.1.3

Apr 15, 2013

0.5.1.2

Mar 15, 2013

0.5.1.1

Feb 27, 2013

0.5.1

Oct 8, 2012

0.5.0

Oct 5, 2012

0.3.0

Dec 15, 2011

0.2.9.1

Dec 14, 2011

0.2.9

Dec 14, 2011

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyConTextNLP-0.6.0.0.tar.gz (20.6 kB view details)

Uploaded Apr 24, 2015 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyConTextNLP-0.6.0.0-py2.7.egg (38.0 kB view details)

Uploaded Apr 24, 2015 Egg

pyConTextNLP-0.6.0.0-py2-none-any.whl (23.4 kB view details)

Uploaded Apr 24, 2015 Python 2

File details

Details for the file pyConTextNLP-0.6.0.0.tar.gz.

File metadata

Download URL: pyConTextNLP-0.6.0.0.tar.gz
Upload date: Apr 24, 2015
Size: 20.6 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pyConTextNLP-0.6.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bd61441334e89b5e4f92c7f33bbb63bd8ba135686e6a7330d336dcbed2d12efd`
MD5	`c2405a2a174dab0e333a396c2deeba8f`
BLAKE2b-256	`7032c12653968f717f1d52cdcecfa4f3358bb9786495a6dbac0bc9b585e606ae`

See more details on using hashes here.

File details

Details for the file pyConTextNLP-0.6.0.0-py2.7.egg.

File metadata

Download URL: pyConTextNLP-0.6.0.0-py2.7.egg
Upload date: Apr 24, 2015
Size: 38.0 kB
Tags: Egg
Uploaded using Trusted Publishing? No

File hashes

Hashes for pyConTextNLP-0.6.0.0-py2.7.egg
Algorithm	Hash digest
SHA256	`ea51a2fd08f0515a0841e8bd77ca0d52c3fa6cc9e08be1bde660bba37e20b332`
MD5	`efcebae45eb693598514d972161ca8f6`
BLAKE2b-256	`4646d11a01b0444b783c9c33ba731b3c57a50cdc5cb34433d2b28835460b69df`

See more details on using hashes here.

File details

Details for the file pyConTextNLP-0.6.0.0-py2-none-any.whl.

File metadata

Download URL: pyConTextNLP-0.6.0.0-py2-none-any.whl
Upload date: Apr 24, 2015
Size: 23.4 kB
Tags: Python 2
Uploaded using Trusted Publishing? No

File hashes

Hashes for pyConTextNLP-0.6.0.0-py2-none-any.whl
Algorithm	Hash digest
SHA256	`fecd4027c1990b82781480a07bbc6d3e6c1adf5de13d4148f040b0a4ea99c11c`
MD5	`ac31d385370c8c051f630faa62b5e0db`
BLAKE2b-256	`4376fb2332c8f459e79c37ff822aca96a0c83f9701f110de434b5a702095e00c`

See more details on using hashes here.

pyConTextNLP 0.6.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes