Phenotype comparison scoring by semantic similarity.

These details have been verified by PyPI

Maintainers

arvkevi Carlos.Borroto smcgee_genedx vdustach vlad.gainullin

These details have not been verified by PyPI

Project links

Project description

phenopy

phenopy was developed using Python 3.9 and functions to perform phenotype similarity scoring by semantic similarity. phenopy is a lightweight but highly optimized command line tool and library to efficiently perform semantic similarity scoring on generic entities with phenotype annotations from the Human Phenotype Ontology (HPO).

Phenotype Similarity Clustering

Installation

Install using pip:

pip install phenopy

Install from GitHub:

git clone https://github.com/GeneDx/phenopy.git
cd phenopy
pipx install poetry
poetry install

Command Line Usage

score

phenopy is primarily used as a command line tool. An entity, as described here, is presented as a sample, gene, or disease, but could be any concept that warrants annotation of phenotype terms.

Use phenopy score to perform semantic similarity scoring in various formats. Write the results of any command to file using --output-file=/path/to/output_file.txt

Score similarity of entities defined by the HPO terms from an input file against all the OMIM diseases in .phenopy/data/phenotype.hpoa. We provide a test input file in the repo. The default summarization method is to use --summarization-method=BMWA which weighs each diseases' phenotypes by the frequency of a phenotype seen in each particular disease.
```
phenopy score tests/data/test.score.txt
```
Output:
```
#query	entity_id	score
118200  210100  0.0
118200  615779  0.0
118200  613266  0.0052
...
```
Score similarity of entities defined by the HPO terms from an input file against all the OMIM diseases in .phenopy/data/phenotype.hpoa, to use the non-weighted summarization method use --summarization-method=BMA which uses a traditional best-match average summarization of semantic similarity scores when comparing terms from record a with terms from record b.
```
phenopy score tests/data/test.score.txt --summarization-method=BMWA
```
Output:
```
#query	entity_id	score
118200  210100  0.0
118200  615779  0.0
118200  613266  0.0052
...
```
Score similarity of an entities defined by the HPO terms from an input file against a custom list of entities with HPO annotations, referred to as the --records-file. Both files are in the same format.
```
phenopy score tests/data/test.score-short.txt --records-file tests/data/test.score-long.txt
```
Output:
```
#query  entity_id       score
118200  118200  0.0169
118200  300905  0.0156
118200  601098  0.0171
...
```

Score pairwise similarity of entities defined by the HPO terms from an input file using --self.

phenopy score tests/data/test.score-long.txt --threads 4 --self

Output:

#query  entity_id       score
118200  118200  0.2284
118200  118210  0.1302
118200  118211  0.1302
118210  118210  0.2048
118210  118211  0.2048
118211  118211  0.2048

Score age-adjusted pairwise similarity of entities defined in the input file, using phenotype mean age and standard deviation defined in the --ages_distribution_file, select best-match weighted average as the scoring summarization method --summarization-method BMWA.
```
phenopy score tests/data/test.score-short.txt --ages_distribution_file tests/data/phenotype_age.tsv --summarization-method BMWA --threads 4 --self
```
Output:
```
#query  entity_id       score
118200  210100  0.0
118200  177650  0.0127
118200  241520  0.0
...
```
The phenotype age file contains hpo-id, mean, sd as tab separated text as follows

HP:0001251 6.0 3.0

HP:0001263 1.0 1.0

HP:0001290 1.0 1.0

HP:0004322 10.0 3.0

HP:0001249 6.0 3.0


HP:0001251	6.0	3.0
HP:0001263	1.0	1.0
HP:0001290	1.0	1.0
HP:0004322	10.0	3.0
HP:0001249	6.0	3.0

If no phenotype ages file is provided --summarization-method=BMWA can be selected to use default, open access literature-derived phenotype ages (~ 1,400 age weighted phenotypes).

 phenopy score tests/data/test.score-short.txt  --summarization-method BMWA --threads 4

Parameters

For a full list of command arguments use phenopy [subcommand] --help:

phenopy score --help

Output:

    --output_file=OUTPUT_FILE
        File path where to store the results. [default: - (stdout)]
    --records_file=RECORDS_FILE
        An entity-to-phenotype annotation file in the same format as "input_file". This file, if provided, is used to score entries in the "input_file" against entries here. [default: None]
    --annotations_file=ANNOTATIONS_FILE
        An entity-to-phenotype annotation file in the same format as "input_file". This file, if provided, is used to add information content to the network. [default: None]
    --ages_distribution_file=AGES_DISTRIBUTION_FILE
        Phenotypes age summary stats file containing phenotype HPO id, mean_age, and std. [default: None]
    --self=SELF
        Score entries in the "input_file" against itself.
    --summarization_method=SUMMARIZATION_METHOD
        The method used to summarize the HRSS matrix. Supported Values are best match average (BMA), best match weighted average (BMWA), and maximum (maximum). [default: BMWA]
    --threads=THREADS
        Number of parallel processes to use. [default: 1]

Library Usage

The phenopy library can be used as a Python module, allowing more control for advanced users.

score

Generate the hpo network and supporting objects:

import os
from phenopy.build_hpo import generate_annotated_hpo_network
from phenopy.score import Scorer

# data directory
phenopy_data_directory = os.path.join(os.getenv('HOME'), '.phenopy/data')

# files used in building the annotated HPO network
obo_file = os.path.join(phenopy_data_directory, 'hp.obo')
disease_to_phenotype_file = os.path.join(phenopy_data_directory, 'phenotype.hpoa')

# if you have a custom ages_distribution_file, you can set it here.
ages_distribution_file = os.path.join(phenopy_data_directory, 'xa_age_stats_oct052019.tsv')

hpo_network, alt2prim, disease_records = \
    generate_annotated_hpo_network(obo_file,
                                   disease_to_phenotype_file,
                                   ages_distribution_file=ages_distribution_file
                                   )

Then, instantiate the Scorer class and score hpo term lists.

scorer = Scorer(hpo_network)

terms_a = ['HP:0001263', 'HP:0011839']
terms_b = ['HP:0001263', 'HP:0000252']

print(scorer.score_term_sets_basic(terms_a, terms_b))

Output:

0.11213185474495047

miscellaneous

The library can be used to prune parent phenotypes from the phenotype.hpoa and store pruned annotations as a file

from phenopy.util import export_phenotype_hpoa_with_no_parents
# saves a new file of phenotype disease annotations with parent HPO terms removed from phenotype lists.
disease_to_phenotype_no_parents_file = os.path.join(phenopy_data_directory, 'phenotype.noparents.hpoa')
export_phenotype_hpoa_with_no_parents(disease_to_phenotype_file, disease_to_phenotype_no_parents_file, hpo_network)

Initial setup

phenopy is designed to run with minimal setup from the user, to run phenopy with default parameters (recommended), skip ahead to the Commands overview.

This section provides details about where phenopy stores data resources and config files. The following occurs when you run phenopy for the first time.

phenopy creates a .phenopy/ directory in your home folder and downloads external resources from HPO into the $HOME/.phenopy/data/ directory.
phenopy creates a $HOME/.phenopy/phenopy.ini config file where users can set variables for phenopy to use at runtime.

Config

While we recommend using the default settings for most users, the config file can be modified: $HOME/.phenopy/phenopy.ini.

To run phenopy with a different version of hp.obo, set the path of obo_file in $HOME/.phenopy/phenopy.ini.

Contributing

We welcome contributions from the community. Please follow these steps to setup a local development environment.

pipenv install --dev

To run tests locally:

pipenv shell
coverage run --source=. -m unittest discover --start-directory tests/
coverage report -m

References

The underlying algorithm which determines the semantic similarity for any two HPO terms is based on an implementation of HRSS, published here.

Citing Phenopy

Please use the following Bibtex to cite this software.

@software{arvai_phenopy_2019,
    title = {Phenopy},
    rights = {Attribution-NonCommercial-ShareAlike 4.0 International},
    url = {https://github.com/GeneDx/phenopy},
    abstract = {Phenopy is a Python package to perform phenotype similarity scoring by semantic similarity.
        Phenopy is a lightweight but highly optimized command line tool and library to efficiently perform semantic
        similarity scoring on generic entities with phenotype annotations from the Human Phenotype Ontology (HPO).},
    version = {0.3.0},
    author = {Arvai, Kevin and Borroto, Carlos and Gainullin, Vladimir and Retterer, Kyle},
    date = {2019-11-05},
    year = {2019},
    doi = {10.5281/zenodo.3529569}
}

Project details

These details have been verified by PyPI

Maintainers

arvkevi Carlos.Borroto smcgee_genedx vdustach vlad.gainullin

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.0

Jun 17, 2023

0.5.4

Jun 7, 2023

0.5.3

Mar 6, 2021

0.5.2

Jan 5, 2021

0.5.1

Dec 28, 2020

0.5.0

Dec 28, 2020

0.4.2

Jun 18, 2020

0.4.1

Jun 5, 2020

0.4.0

Mar 4, 2020

0.3.2

Dec 9, 2019

0.3.0

Nov 5, 2019

0.2.1

Sep 10, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phenopy-0.6.0.tar.gz (13.0 MB view details)

Uploaded Jun 17, 2023 Source

Built Distribution

phenopy-0.6.0-py3-none-any.whl (13.0 MB view details)

Uploaded Jun 17, 2023 Python 3

File details

Details for the file phenopy-0.6.0.tar.gz.

File metadata

Download URL: phenopy-0.6.0.tar.gz
Upload date: Jun 17, 2023
Size: 13.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for phenopy-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`9f2975484c75346cd45b457ee5a45de846bb8d2da45ce4d92788cedb00f7e221`
MD5	`491e8228ae246071a7fccab214d5b249`
BLAKE2b-256	`e876bc19daded003696e8e7b127bcccd914d78049adeef593db0e09de42e07aa`

See more details on using hashes here.

File details

Details for the file phenopy-0.6.0-py3-none-any.whl.

File metadata

Download URL: phenopy-0.6.0-py3-none-any.whl
Upload date: Jun 17, 2023
Size: 13.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for phenopy-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`14c68a5592d5b77a310e38b0ab1ff847c593bd71e38708e550b75d471d941b6d`
MD5	`e9078e5a5c7a7e434130f05abfd991da`
BLAKE2b-256	`db20aa90141aeafc20ebffb85092ea9f402d831e6498cd7188f8d9f12bbf603e`

See more details on using hashes here.

phenopy 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

phenopy

Installation

Command Line Usage

score

Parameters

Library Usage

score

miscellaneous

Initial setup

Config

Contributing

References

Citing Phenopy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes