A lightweight clinical entity linking tool using distilled clinical language models from Huggingface and spaCy/ScispaCy
Project description
NLPIE: A Lightweight Entity Linker
A lightweight entity linking tool using distilled clinical language models from Huggingface and spaCy/ScispaCy intended to connect clinical notes to UMLS codes. Additional mapping is available for ICD-10 and SNOMED CT entries.
Repository Structure
.
├── README.md
├── entity_linker.py
├── entity_linker_hf.py
├── mappings-from-UMLS
│ ├── cui_to_icd10_EXACT.json
│ ├── cui_to_icd10_RO.json
│ ├── cui_to_snomed_EXACT.json
│ └── cui_to_snomed_RO.json
├── preprocessor.py
├── query_filter.py
└── requirements.txt
Installation
Install the required packages by running:
pip install nlpie
Usage
Preprocessor
Use the InputPreprocessor
class in preprocessor.py
to preprocess an input CSV file:
from preprocessor import InputPreprocessor
input_path = 'input.csv'
output_path = 'preprocessed.csv'
preprocessor = InputPreprocessor(input_path)
preprocessed_data = preprocessor.preprocess_input_file(output_path)
Query Filter
Use the QueryFilter
class in query_filter.py
to filter a preprocessed CSV file based on a text query:
from query_filter import QueryFilter
query = "pneumonia"
input_csv_path = "preprocessed.csv"
query_filter = QueryFilter()
query_umls_codes = query_filter.process_query(query)
filtered_data, negated_rows = query_filter.filter_rows(query_umls_codes, input_csv_path)
query_filter.save_filtered_data_to_csv(filtered_data, negated_rows, query, input_csv_path)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpie-0.3.2.tar.gz
(1.9 kB
view hashes)
Built Distribution
nlpie-0.3.2-py3-none-any.whl
(1.8 kB
view hashes)