Phenotype extraction using Named Entity Recognition
Project description
txt2hpo
txt2hpo
is a Python package for extracting HPO-encoded phenotypes from text, Human Phenotype Ontology (HPO).
txt2hpo
accounts for inflection, is able to parse complex multi-word phenotypes and comes with a built-in medical spellchecker.
Installation
Install from GitHub
git clone https://github.com/GeneDx/txt2hpo.git
cd txt2hpo
python setup.py install
Library usage
from txt2hpo.extract import hpo
hpos = hpo("patient with developmental delay and hypotonia")
print(hpos)
[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"},
{"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]
txt2hpo
will attempt to correct spelling errors by default, at the cost of slower processing speed.
This feature can be turned off by setting the correct_spelling
flag to False
.
from txt2hpo.extract import hpo
hpos = hpo("patient with devlopental delay and hyptonia", correct_spelling=True)
[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"},
{"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
txt2hpo-0.1.0.tar.gz
(2.0 MB
view hashes)