A natural language medical domain parsing library.
Project description
Medical natural language parsing and utility library
A natural language medical domain parsing library. This library:
- Provides an interface to the UTS (UMLS Terminology Services) RESTful service with data caching (NIH login needed).
- Wraps the MedCAT library by parsing medical and clinical text into first class Python objects reflecting the structure of the natural language complete with UMLS entity linking with CUIs and other domain specific features.
- Combines non-medical (such as POS and NER tags) and medical features (such as CUIs) in one API and resulting data structure and/or as a Pandas data frame.
- Provides cui2vec as a word embedding model for either fast indexing and access or to use directly as features in a Zensols Deep NLP embedding layer model.
- Provides access to cTAKES using as a dictionary like Stash abstraction.
- Includes a command line program to access all of these features without having to write any code.
Documentation
See the full documentation. The API reference is also available.
Obtaining
The easiest way to install the command line program is via the pip
installer.
pip3 install --use-deprecated=legacy-resolver zensols.mednlp
Binaries are also available on pypi.
If the cui2vec functionality is used, the Zensols Deep NLP library is also needed, which is installed with:
pip install --use-deprecated=legacy-resolver zensols.deepnlp
Usage
To parse text, create features, and extract clinical concept identifiers:
>>> from zensols.mednlp import ApplicationFactory
>>> doc_parser = ApplicationFactory.get_doc_parser()
>>> doc = doc_parser('John was diagnosed with kidney failure')
>>> for tok in doc.tokens: print(tok.norm, tok.pos_, tok.tag_, tok.cui_, tok.detected_name_)
John PROPN NNP -<N>- -<N>-
was AUX VBD -<N>- -<N>-
diagnosed VERB VBN -<N>- -<N>-
with ADP IN -<N>- -<N>-
kidney NOUN NN C0035078 kidney~failure
failure NOUN NN C0035078 kidney~failure
>>> print(doc.entities)
(<John>, <kidney failure>)
See the full example, and for other functionality, see the examples.
Attribution
This API utilizes the following frameworks:
- MedCAT: used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS.
- cTAKES: a natural language processing system for extraction of information from electronic medical record clinical free-text.
- cui2vec: a new set of (like word) embeddings for medical concepts learned using an extremely large collection of multimodal medical data.
- Zensols Deep NLP library: a deep learning utility library for natural language processing that aids in feature engineering and embedding layers.
- ctakes-parser: parses cTAKES output in to a Pandas data frame.
Citation
If you use this project in your research please use the following BibTeX entry:
@article{Landes_DiEugenio_Caragea_2021,
title={DeepZensols: Deep Natural Language Processing Framework},
url={http://arxiv.org/abs/2109.03383},
note={arXiv: 2109.03383},
journal={arXiv:2109.03383 [cs]},
author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},
year={2021},
month={Sep}
}
Community
Please star the project and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.
Changelog
An extensive changelog is available here.
License
Copyright (c) 2021 - 2023 Paul Landes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file zensols.mednlp-1.4.0-py3-none-any.whl
.
File metadata
- Download URL: zensols.mednlp-1.4.0-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4afc9b6b0b12c4089eaf7d04d069958ed2ced32aee253e5030c555b8602cda3 |
|
MD5 | bfadaa05e5c78367143ca92183937cde |
|
BLAKE2b-256 | 10804fe11cfd1ec741ea8cfff8c348ea6ed717541101f48de82c9d402cc0f2eb |