Skip to main content

Phenotype extraction using Named Entity Recognition

Project description

txt2hpo

txt2hpo is a Python package for extracting HPO-encoded phenotypes from text, Human Phenotype Ontology (HPO). txt2hpo accounts for inflection, is able to parse complex multi-word phenotypes and comes with a built-in medical spellchecker.

Installation

Install from GitHub

git clone https://github.com/GeneDx/txt2hpo.git
cd txt2hpo
python setup.py install

Library usage

from txt2hpo.extract import hpo

hpos = hpo("patient with developmental delay and hypotonia")
print(hpos)

[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"}, 
 {"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]
    
    

txt2hpo will attempt to correct spelling errors by default, at the cost of slower processing speed. This feature can be turned off by setting the correct_spelling flag to False.

from txt2hpo.extract import hpo

hpos = hpo("patient with devlopental delay and hyptonia", correct_spelling=True)

[{"hpid": ["HP:0001290"], "index": [37, 46], "matched": "hypotonia"}, 
 {"hpid": ["HP:0001263"], "index": [13, 32], "matched": "developmental delay"}]
    

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for txt2hpo, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size txt2hpo-0.1.0.tar.gz (2.0 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page