HPO concept recognition and phenotype extraction tool
Project description
txt2hpo
txt2hpo is a Python library for extracting HPO-encoded phenotypes from text.
txt2hpo recognizes differences in inflection (e.g. hypotonic vs. hypotonia), handles negation and comes with a built-in medical spellchecker.
Installation
Install using pip
pip install txt2hpo
Install from GitHub
git clone https://github.com/GeneDx/txt2hpo.git
cd txt2hpo
python setup.py install
Library usage
from txt2hpo.extract import Extractor
extract = Extractor()
result = extract.hpo("patient with developmental delay and hypotonia")
print(result.hpids)
["HP:0001263", "HP:0001290"]
txt2hpo will attempt to correct spelling errors by default, at the cost of slower processing.
This feature can be turned off by setting the correct_spelling flag to False.
from txt2hpo.extract import Extractor
extract = Extractor(correct_spelling = False)
result = extract.hpo("patient with devlopental delay and hyptonia")
print(result.hpids)
[]
txt2hpo handles negation using negspaCy. To remove negated phenotypes set remove_negated flag to True.
Both the extracted and negated HPO terms can be retrieved.
from txt2hpo.extract import Extractor
extract = Extractor(remove_negated=True)
result = extract.hpo("patient has developmental delay but no hypotonia")
print(result.hpids)
["HP:0001263"]
print(result.negated_hpids)
["HP:0001252"]
txt2hpo picks the longest overlapping phenotype by default. To disable this feature set remove_overlapping flag to False.
from txt2hpo.extract import Extractor
extract = Extractor(remove_overlapping=False)
result = extract.hpo("patient with polycystic kidney disease")
print(result.hpids)
["HP:0000113", "HP:0000112"]
extract = Extractor(remove_overlapping=True)
result = extract.hpo("patient with polycystic kidney disease")
print(result.hpids)
["HP:0000113"]
txt2hpo outputs a valid JSON string, which contains information about extracted HPIDs, their character span and matched string.
from txt2hpo.extract import Extractor
extract = Extractor()
result = extract.hpo("patient with developmental delay and hypotonia")
print(result.json)
'[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"},
{"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file txt2hpo-2021.0.tar.gz.
File metadata
- Download URL: txt2hpo-2021.0.tar.gz
- Upload date:
- Size: 45.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.62.3 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8c4062b4fd9e1ee29cc1940c0d42203455aeaa2f6a3cc911acb92798e1fa03d
|
|
| MD5 |
7892a499b6834af64559f5fe262926dd
|
|
| BLAKE2b-256 |
1c27d59cd0e36defecc9d53a791d79a3f885b2d6ef08cab8978aba9e7ebe4bd9
|
File details
Details for the file txt2hpo-2021.0-py3-none-any.whl.
File metadata
- Download URL: txt2hpo-2021.0-py3-none-any.whl
- Upload date:
- Size: 45.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.62.3 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f89ff0cd9ec7e80a214ada8eb434cc9cafd43a64a82f1f73ed08f1a5d83a53a
|
|
| MD5 |
95751e1ccc6e35726326dbf9928b1809
|
|
| BLAKE2b-256 |
8b75a50423b9ee7448017b2fb3b08aabaac4fca13e6c999a06534ed4c6aab9a2
|