A python tool to identify oligogenic combinations of genes with rare variants
Project description
Pythonic version of RareComb
RareComb is a tool to find oligogenic combinations of genes with rare variants that are enriched in individuals with a specific phenotype. RareComb is orginally developed in R (https://github.com/girirajanlab/RareComb). Here we provide a pythonic version of RareComb with some additional utilities.
Installation
$ pip install pyrarecomb
User interface
The pythonic version of RareComb currently has 3 user facing functions:
-
compare_enrichment: Checks for oligogenic combinations of rare genetic variants that are enriched in cases but not in controls.
-
compare_enrichment_depletion: Checks for oligogenic combinations of rare genetic variants that are enriched in cases but depleted in controls.
-
compare_enrichment_modifiers: Checks for oligogenic combinations of rare genetic variants that are enriched in cases but not in controls where one of the items in a combination must be within an user-defined set of genes.
All these functions have the following required arguments:
- boolean_input_df: A dataframe where rows are the number of samples and columns include sample ids (represented by the column name: "Sample_Name") along with one hot encoded information about the sample genotype (presence or absence rare deleterious mutation within a gene; these columns should start with the prefix "Input_") and phenotype (presence or absence of a phenotype; this column should start with the prefix "Output_"). Example dataframe is as follows:
Sample_Name | Input_GeneA | Input_GeneB | Input_GeneC | ... | Output_phenotype |
---|---|---|---|---|---|
Sample_1111 | 0 | 1 | 1 | ... | 1 |
Sample_2198 | 0 | 1 | 0 | ... | 0 |
... | |||||
Sample_N | 0 | 0 | 1 | ... | 0 |
- combo_length: The number of items to mine within a combination.
- min_indv_threshold: The minimum number of individuals to consider that must possess a combination before checking for enrichment.
- max_freq_threshold: The maximum fraction of the cohort size that possess a combination (to filter out highly frequent combinations).
Along with the other required arguments, compare_enrichment_modifiers has an additional required argument:
- primary_input_entities: List of genes that must be part of the enriched combinations
Usage examples
Please refer to the notebooks dir in repo.
Possible modifications for v0.1.0
- Refining control frequencies step may not be required
- After filter, introduce raise ValueError step if there is no data
- Create function for getting exp and obs prob for combos
- Create function for calculating p values
- Discuss the nominal significance filtration strategy
- Create multiple testing function
- Rounding adjusted p-values to 3 digits not a good idea
- compare enrichment modifiers why are we checking for primary entities only as consequents?
Internal use
Package creation
$ python3 -m pip install --upgrade pip
$ python3 -m pip install --upgrade build
$ python3 -m pip install --upgrade twine
$ python -m build
$ python3 -m twine upload dist/*
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyrarecomb-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf5870da13adaedaa71be975b60e5c74710d1e2ac4e3cc212411b936f10e97a6 |
|
MD5 | f1007bf53c34c10d2c84f889bef2f349 |
|
BLAKE2b-256 | a191def8a917e8cc552cd9732e2416c578a7fcd03a2f1197057b9a08a5581512 |