Skip to main content

RadText is a high-performance Python Radiology Text Analysis System.

Project description

RadText

RadText is a high-performance Python Radiology Text Analysis System.

Prerequisites

  1. Python >= 3.6
  2. Linux
  3. Java

Get Started

Download radtext

$ git clone https://github.com/bionlplab/radtext.git
$ cd radtext

Once you have a copy of the resource, you can prepare a virtual environment.

$ python -m venv radtext_env
$ source radtext_env/bin/activate

Then install the required packages:

$ pip install -r requirements.txt

NOTE: If you encounter Building wheel for bllipparser (setup.py) ... error when installing bllipparser, try installing these two packages first, then restarting your virtual environment:

$ conda install gcc_linux-64
$ conda install gxx_linux-64
$ conda deactivate
$ conda activate radtext

Prepare the dataset.

RadText uses BioC format as the unified interface. Some examples can be found in the examples folder. You can store your input reports in a .csv file (by default, column 'ID' stores the report ids, and column 'TEXT' stores the reports), and then use the following command to convert your .csv file into BioC format.

$ python cmd/csv2bioc.py -i /path/to/csv_file -o /path/to/bioc_file

If you have lots of reports, it is recommended to put them into several BioC files, for example, 5000 reports per BioC file.

Run radtext

Run RadText to analyze radiology reports. Please refer to User guide for details.

Import radtext as a Python Library and use API

This following code snippet shows an example of using radtext's pipeline to analyze radiology report.

import radtext

# initialize RadText's pipeline.
nlp = radtext.Pipeline()

# run RadText's pipeline on a sample report.
collection = nlp('FINDINGS: The lungs are clear without consolidation, effusion or edema...')

print(collection)

The annotation results are stored in a Collection instance, the following code snippet shows an example of accessing the detected disease findings and the corresponding negation status.

for doc in collection.documents:
   for passage in doc.passages:
      for annotation in passage.annotations:
         print(annotation.infon['source_concept'], annotation.infon['negation'])

RadText's API supports the mutual conversion between BioC format and OMOP CDM. The following code snippet shows an example of converting BioC to CDM, and then converting CDM to BioC.

import bioc
from radtext import BioC2CDM, CDM2BioC

# initialize RadText's BioC2CDM converter.
bioc2cdm = BioC2CDM()
with open('/PATH/TO/BIOC_FILE.xml') as fp:
    collection = bioc.load(fp)

cdm_df = bioc2cdm(collection)

# initialize RadText's CDM2BioC converter.
cdm2bioc = CDM2BioC()
bioc_collection = cdm2bioc(cdm_df)

Documentation

Documentation is available here.

Contributing

Refer to our contribution guide.

Acknowledgment

This work is supported by the National Library of Medicine under Award No. 4R00LM013001 and the NIH Intramural Research Program, National Library of Medicine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

radtext-1.0.dev1.tar.gz (2.3 MB view hashes)

Uploaded Source

Built Distribution

radtext-1.0.dev1-py3-none-any.whl (2.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page