radtext

RadText is a high-performance Python Radiology Text Analysis System.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

RadText is a high-performance Python Radiology Text Analysis System.

Prerequisites

Python >= 3.6
Linux
Java

Get Started

Download radtext

$ git clone https://github.com/bionlplab/radtext.git
$ cd radtext

Once you have a copy of the resource, you can prepare a virtual environment.

$ python -m venv radtext_env
$ source radtext_env/bin/activate

Then install the required packages:

$ pip install -r requirements.txt

NOTE: If you encounter Building wheel for bllipparser (setup.py) ... error when installing bllipparser, try installing these two packages first, then restarting your virtual environment:

$ conda install gcc_linux-64
$ conda install gxx_linux-64
$ conda deactivate
$ conda activate radtext

Prepare the dataset.

RadText uses BioC format as the unified interface. Some examples can be found in the examples folder. You can store your input reports in a .csv file (by default, column 'ID' stores the report ids, and column 'TEXT' stores the reports), and then use the following command to convert your .csv file into BioC format.

$ python cmd/csv2bioc.py -i /path/to/csv_file -o /path/to/bioc_file

If you have lots of reports, it is recommended to put them into several BioC files, for example, 5000 reports per BioC file.

Run radtext

Run RadText to analyze radiology reports. Please refer to User guide for details.

Import radtext as a Python Library and use API

This following code snippet shows an example of using radtext's pipeline to analyze radiology report.

import radtext

# initialize RadText's pipeline.
nlp = radtext.Pipeline()

# run RadText's pipeline on a sample report.
collection = nlp('FINDINGS: The lungs are clear without consolidation, effusion or edema...')

print(collection)

The annotation results are stored in a Collection instance, the following code snippet shows an example of accessing the detected disease findings and the corresponding negation status.

for doc in collection.documents:
   for passage in doc.passages:
      for annotation in passage.annotations:
         print(annotation.infon['source_concept'], annotation.infon['negation'])

RadText's API supports the mutual conversion between BioC format and OMOP CDM. The following code snippet shows an example of converting BioC to CDM, and then converting CDM to BioC.

import bioc
from radtext import BioC2CDM, CDM2BioC

# initialize RadText's BioC2CDM converter.
bioc2cdm = BioC2CDM()
with open('/PATH/TO/BIOC_FILE.xml') as fp:
    collection = bioc.load(fp)

cdm_df = bioc2cdm(collection)

# initialize RadText's CDM2BioC converter.
cdm2bioc = CDM2BioC()
bioc_collection = cdm2bioc(cdm_df)

Documentation

Documentation is available here.

Contributing

Refer to our contribution guide.

Acknowledgment

This work is supported by the National Library of Medicine under Award No. 4R00LM013001 and the NIH Intramural Research Program, National Library of Medicine.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.dev8 pre-release

Jul 18, 2022

1.0.dev7 pre-release

Feb 25, 2022

1.0.dev6 pre-release

Feb 23, 2022

1.0.dev3 pre-release

Feb 21, 2022

This version

1.0.dev2 pre-release

Feb 7, 2022

1.0.dev1 pre-release

Feb 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

radtext-1.0.dev2.tar.gz (2.3 MB view hashes)

Uploaded Feb 7, 2022 Source

Built Distribution

radtext-1.0.dev2-py3-none-any.whl (2.3 MB view hashes)

Uploaded Feb 7, 2022 Python 3

Hashes for radtext-1.0.dev2.tar.gz

Hashes for radtext-1.0.dev2.tar.gz
Algorithm	Hash digest
SHA256	`d062ad13f02e54bed472530d936802dbc55fd867c300d0dd66ed02604c81b806`
MD5	`c6a61f46877cb2f58274c044ad8062cc`
BLAKE2b-256	`1f04ebebc2522027b1b6f3e0576ec8b669885fbed8e2598018b8dc641a696f43`

Hashes for radtext-1.0.dev2-py3-none-any.whl

Hashes for radtext-1.0.dev2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b25fdeab34f451adc73693cee97afee416f5900ca53f238800fa9d9ba44ff2be`
MD5	`bb903aef9d40124378dacc980a0ab728`
BLAKE2b-256	`f3bf9f93de807e40fde7a9203da84741243e6d06b39a9a3bb0b7a4bbbaad09d0`