Skip to main content

Fast lightweight NLP library for concept and segment extraction with negation/uncertainty detection.

Project description

NLPLite

Fast, lightweight NLP for concept extraction with sentence/paragraph segments and negation/uncertainty detection.

Highlights

  • Fast string matching: Aho–Corasick with a pure‑Python fallback in case C is not available.
  • Whole‑word, case‑insensitive: term matching with smart longest match capture.
  • Negation & uncertainty: term hits accompanied by negation status :Y (YES), :N (NO), :U (UNCERTAIN).
  • Segment text: return the sentence or paragraph containing each hit (or ±N chars around the term hit).
  • Code mapping: map terms to codes (ICD, SNOMED, CUIs, etc).
  • Simple CLI: one command to search, extract, convert codes, or get assertion status.

Install

pip install nlplite

Quick Start

1) Search, locate and extract terms or phrases within a large text file 🕵️

from nlplite import search_terms

text = "Patient has heart failure. He denies chest pain but reports headache."
hits = search_terms(text, ["heart failure", "headache"], window_size="sentence")

print(hits)
# [
#   ('heart failure', 12, 24, 'Patient has heart failure.'),
#   ('headache', 53, 60, 'He denies chest pain but reports headache.')
# ]

Return shape: (term, start_postion, end_position, [context])
window_size may be an int (±N chars), "sentence", "paragraph", or None.

Offsets: Set include_offsets=False to skip start/end locations from results.

2) Translate your text to codes (Clinical usecase: Term-CUI, Term-ICD code)

from nlplite import convert_text_to_codes

dictionary = [("diabetes", "E11"), ("hypertension", "I10"), ("stroke", "I63")]
text = "No stroke. Has hypertension and diabetes."

# All occurrences with locations
rows = convert_text_to_codes(text, dictionary, negation_check=True, unique=False)
print(rows)
# [('I63:N', 3, 8), ('I10:Y', 13, 24), ('E11:Y', 29, 37)]

Notes:

  • When negation_check=True, the code fields carry a flag :Y/:N/:U.

  • If your file is two columns with a header (term,code), pass sep="," (or "tab") and leave header=True (default).

  • Turn offstart/end locations from results by passing include_offsets=False

3) Extract sentences, paragraphs or string surrounding terms of interest 📚

from nlplite import extract_terms_with_window

# Dictionary can be a path to CSV/TSV or an in‑memory dict/list.
dictionary = [("heart failure", "I50.9"), ("chest pain", "R07.9"), ("headache", "R51")]

text = "Patient has heart failure. He denies chest pain but reports headache."
rows = extract_terms_with_window(
    text=text,
    dictionary=dictionary,      # or "terms.csv"
    window_size="sentence",     # 'sentence' | 'paragraph' | int | None
    include_code=None,          # auto-include codes if present
    include_offsets=True,
    negation_check=True         # adds :Y / :N / :U flags
)

print(rows)
# [
#   ('heart failure:Y', 'I50.9:Y', 12, 24, 'Patient has heart failure.'),
#   ('chest pain:N',    'R07.9:N', 33, 42, 'He denies chest pain but reports headache.'),
#   ('headache:Y',      'R51:Y',   53, 60, 'He denies chest pain but reports headache.')
# ]

CLI Quickstart

After installing, use the nlplite command.

Search (inline text) 🔎

nlplite --search \
  --terms "heart","heart failure" \
  --text "Patient has heart failure. He denies chest pain." \
  --window sentence \
  --no-offsets \
  --format json
#  [["heart failure",12,24,"Patient has heart failure."]]

Extract with dictionary file + negation 🧠

# terms.csv (with header):
# term,code
# heart failure,I50.9
# chest pain,R07.9
# headache,R51

nlplite --extract --dict terms.csv --sep "," \
  --text "note.txt" \
  --window paragraph \
  --negation \
  --format text
# Example line:
# Term: chest pain (negated), Code: R07.9, Location: 123-132, Context: "..."

Convert to unique codes only 🔄

nlplite --convert --dict terms.csv --sep "," \
  --text "note.txt" \
  --unique --format json
# → ["I50.9:Y","R07.9:N","R51:Y"]

Tips:

  • Use --neg-window N to restrict how far a negation/uncertainty cue can reach.
  • --format json|csv|text controls output shape.
  • --no-header if your dictionary file has no header row.
  • --convert does not support --window (by design).
  • --no-offsets to skip start/end locations from results.

Notes

  • Matching is case‑insensitive and respects word boundaries; overlapping hits resolve to the longest match first.
  • Performance uses a C‑accelerated automaton when pyahocorasick is present; a pure‑Python fallback maintains portability.
  • Segmentation (window_size) can be an integer (±N characters), "sentence", or "paragraph".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlplite-0.2.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlplite-0.2.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file nlplite-0.2.0.tar.gz.

File metadata

  • Download URL: nlplite-0.2.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for nlplite-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9d0d29acc74b09e0cfa9d9509ac3d7ccd7fe56f3ec9bd5555842009d1ad80a27
MD5 132d46d6fa4f0da3ba023dd5f2681ac3
BLAKE2b-256 adec23debc34897c85bd6003ae261227d71240bfc104027c11f9dc552a4d22c1

See more details on using hashes here.

File details

Details for the file nlplite-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nlplite-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for nlplite-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e504aae1314376912fc81e0cbb4052e3d3c0fc42d7a7e81b33f5c42a2b42d03a
MD5 22fc5e0cde97be7ac9eecff8e8e89643
BLAKE2b-256 c50c7404a9c9ae6447370dd85df2f5873f9d161c741e5f511f83b5e695f1503d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page