Skip to main content

Phenotype extraction using Named Entity Recognition

Project description

txt2hpo

txt2hpo is a Python package for extracting HPO-encoded phenotypes from text, Human Phenotype Ontology (HPO). txt2hpo accounts for inflection, is able to parse complex multi-word phenotypes and comes with a built-in medical spellchecker.

Installation

Install from GitHub

git clone https://github.com/GeneDx/txt2hpo.git
cd txt2hpo
python setup.py install

Library usage

from txt2hpo.extract import hpo

hpos = hpo("patient with developmental delay and hypotonia")
print(hpos)

[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"}, 
 {"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]
    
    

txt2hpo will attempt to correct spelling errors by default, at the cost of slower processing speed. This feature can be turned off by setting the correct_spelling flag to False.

from txt2hpo.extract import hpo

hpos = hpo("patient with devlopental delay and hyptonia", correct_spelling=True)

[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"}, 
 {"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

txt2hpo-0.1.0.tar.gz (2.0 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page