Fast lightweight NLP library for concept and segment extraction with negation/uncertainty detection.
Project description
NLPLite
Fast, lightweight NLP for concept extraction with sentence/paragraph segments and negation/uncertainty detection.
Highlights
- Fast string matching: Aho–Corasick with a pure‑Python fallback in case C is not available.
- Whole‑word, case‑insensitive: term matching with smart longest match capture.
- Negation & uncertainty: term hits accompanied by negation status
:Y(YES),:N(NO),:U(UNCERTAIN). - Segment text: return the sentence or paragraph containing each hit (or ±N chars around the term hit).
- Code mapping: map terms to codes (ICD, SNOMED, CUIs, etc).
- Simple CLI: one command to search, extract, convert codes, or get assertion status.
Install
pip install nlplite
Quick Start
1) Search, locate and extract terms or phrases within a large text file 🕵️
from nlplite import search_terms
text = "Patient has heart failure. He denies chest pain but reports headache."
hits = search_terms(text, ["heart failure", "headache"], window_size="sentence")
print(hits)
# [
# ('heart failure', 12, 24, 'Patient has heart failure.'),
# ('headache', 53, 60, 'He denies chest pain but reports headache.')
# ]
Return shape: (term, start_postion, end_position, [context])
window_size may be an int (±N chars), "sentence", "paragraph", or None.
Offsets: Set include_offsets=False to skip start/end locations from results.
2) Translate your text to codes (Clinical usecase: Term-CUI, Term-ICD code)
from nlplite import convert_text_to_codes
dictionary = [("diabetes", "E11"), ("hypertension", "I10"), ("stroke", "I63")]
text = "No stroke. Has hypertension and diabetes."
# All occurrences with locations
rows = convert_text_to_codes(text, dictionary, negation_check=True, unique=False)
print(rows)
# [('I63:N', 3, 8), ('I10:Y', 13, 24), ('E11:Y', 29, 37)]
Notes:
-
When
negation_check=True, the code fields carry a flag:Y/:N/:U. -
If your file is two columns with a header (term,code), pass
sep=","(or"tab") and leaveheader=True(default). -
Turn off
start/endlocations from results by passinginclude_offsets=False
3) Extract sentences, paragraphs or string surrounding terms of interest 📚
from nlplite import extract_terms_with_window
# Dictionary can be a path to CSV/TSV or an in‑memory dict/list.
dictionary = [("heart failure", "I50.9"), ("chest pain", "R07.9"), ("headache", "R51")]
text = "Patient has heart failure. He denies chest pain but reports headache."
rows = extract_terms_with_window(
text=text,
dictionary=dictionary, # or "terms.csv"
window_size="sentence", # 'sentence' | 'paragraph' | int | None
include_code=None, # auto-include codes if present
include_offsets=True,
negation_check=True # adds :Y / :N / :U flags
)
print(rows)
# [
# ('heart failure:Y', 'I50.9:Y', 12, 24, 'Patient has heart failure.'),
# ('chest pain:N', 'R07.9:N', 33, 42, 'He denies chest pain but reports headache.'),
# ('headache:Y', 'R51:Y', 53, 60, 'He denies chest pain but reports headache.')
# ]
CLI Quickstart
After installing, use the nlplite command.
Search (inline text) 🔎
nlplite --search \
--terms "heart","heart failure" \
--text "Patient has heart failure. He denies chest pain." \
--window sentence \
--no-offsets \
--format json
# [["heart failure",12,24,"Patient has heart failure."]]
Extract with dictionary file + negation 🧠
# terms.csv (with header):
# term,code
# heart failure,I50.9
# chest pain,R07.9
# headache,R51
nlplite --extract --dict terms.csv --sep "," \
--text "note.txt" \
--window paragraph \
--negation \
--format text
# Example line:
# Term: chest pain (negated), Code: R07.9, Location: 123-132, Context: "..."
Convert to unique codes only 🔄
nlplite --convert --dict terms.csv --sep "," \
--text "note.txt" \
--unique --format json
# → ["I50.9:Y","R07.9:N","R51:Y"]
Tips:
- Use
--neg-window Nto restrict how far a negation/uncertainty cue can reach. --format json|csv|textcontrols output shape.--no-headerif your dictionary file has no header row.--convertdoes not support--window(by design).--no-offsetsto skipstart/endlocations from results.
Notes
- Matching is case‑insensitive and respects word boundaries; overlapping hits resolve to the longest match first.
- Performance uses a C‑accelerated automaton when
pyahocorasickis present; a pure‑Python fallback maintains portability. - Segmentation (
window_size) can be an integer (±N characters),"sentence", or"paragraph".
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlplite-0.2.0.tar.gz.
File metadata
- Download URL: nlplite-0.2.0.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d0d29acc74b09e0cfa9d9509ac3d7ccd7fe56f3ec9bd5555842009d1ad80a27
|
|
| MD5 |
132d46d6fa4f0da3ba023dd5f2681ac3
|
|
| BLAKE2b-256 |
adec23debc34897c85bd6003ae261227d71240bfc104027c11f9dc552a4d22c1
|
File details
Details for the file nlplite-0.2.0-py3-none-any.whl.
File metadata
- Download URL: nlplite-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e504aae1314376912fc81e0cbb4052e3d3c0fc42d7a7e81b33f5c42a2b42d03a
|
|
| MD5 |
22fc5e0cde97be7ac9eecff8e8e89643
|
|
| BLAKE2b-256 |
c50c7404a9c9ae6447370dd85df2f5873f9d161c741e5f511f83b5e695f1503d
|