Skip to main content

NegBio: a tool for negation and uncertainty detection

Project description

NegBio
Build status Latest version on PyPI

NegBio is a high-performance NLP tool for negation and uncertainty detection in clinical texts (e.g. radiology reports).

Getting Started

These instructions will get you a copy of the project up and run on your local machine for development and testing purposes. The package should successfully install on Linux (and possibly macOS).

Install environment

  1. Copy the project on your local machine
git clone https://github.com/ncbi-nlp/NegBio.git
  1. Install conda <https://conda.io>
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
chmod 777 Miniconda2-latest-Linux-x86_64.sh
./Miniconda2-latest-Linux-x86_64.sh
conda update conda # By default the version 4.3 is installed
  1. Install or update the conda environment specified in environment2.7.yml by running:
# If the negbio2.7 environment already exists, remove it first
conda env remove --name negbio2.7

# Install the environment
conda env create --file environment2.7.yml
  1. Activate with conda activate negbio2.7 (assumes conda version of at least 4.4).
  2. Add the code directory to PYTHONPATH.
export PYTHONPATH=.:$PYTHONPATH
  1. Install NLTK data.
python -m nltk.downloader universal_tagset punkt wordnet

Prepare the dataset

The program needs the reports with finding mentions annotated in BioC format. All finding mentions have to be specified on the passage level. For example:

 <document>
  <id>00000086</id>
  <passage>
    <offset>0</offset>
    <text>findings: pa and lat cxr at 7:34 p.m.. heart and mediastinum are stable.
          lungs are unchanged. air- filled cystic changes. no pneumothorax. osseous structures
          unchanged scoliosis impression: stable chest. dictating </text>
    <annotation id="24">
      <infon key="term">Pneumothorax</infon>
      <infon key="CUI">C0032326</infon>
      <infon key="annotator">MetaMap</infon>
      <infon key="semtype">dsyn</infon>
      <location length="12" offset="125"/>
      <text>pneumothorax</text>
    </annotation>
  </passage>
</document>

More examples can be found in the examples folder.

Run the script

The easiest way is to run

python negbio/main.py --out=examples examples/1.xml examples/2.xml

The script will detect negative and uncertain findings in files examples/1.xml and examples/2.xml. It saves the results (1.neg.xml and 2.neg.xml) in the directory examples.

A more detailed usage can be obtained by running

python negbio/main.py -h
Usage:
    negbio [options] --out=DIRECTORY SOURCE ...

Options:
    --neg-patterns=FILE             negation rules [default: patterns/neg_patterns.txt]
    --uncertainty-patterns=FILE     uncertainty rules [default: patterns/uncertainty_patterns.txt]
    --model=MODEL_DIR               Bllip parser model directory

Alternatively, you can run the pipeline step-by-step.

  1. pipeline/ssplit.py splits text into sentences.
  2. pipeline/parse.py parses sentence using the Bllip parser.
  3. pipeline/ptb2ud.py converts the parse tree to universal dependencies using Stanford converter.
  4. pipeline/negdetect.py detects negative and uncertain findings.

Customize patterns

By default, the program uses the negation and uncertainty patterns in the patterns folder. However, you are free to create your own patterns. The pattern is a semgrex-type pattern for matching node in the dependency graph. Currently, we only support < and > operations. A detailed grammar specification (using PLY, Python Lex-Yacc) can be found in ngrex/parser.py.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

see LICENSE.txt.

Acknowledgments

This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
negbio-0.9.dev1.tar.gz (22.9 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page