Skip to main content

A python tool to identify oligogenic combinations of genes with rare variants

Project description

Pythonic version of RareComb

RareComb is a tool to find oligogenic combinations of genes with rare variants that are enriched in individuals with a specific phenotype. RareComb was orginally developed in R (https://github.com/girirajanlab/RareComb). Here we provide a pythonic version of RareComb with some additional utilities.

Installation

$ pip install pyrarecomb

User interface

The pythonic version of RareComb currently has 3 user facing functions:

  1. compare_enrichment: Checks for oligogenic combinations of rare genetic variants that are enriched in cases but not in controls.

  2. compare_enrichment_depletion: Checks for oligogenic combinations of rare genetic variants that are enriched in cases but depleted in controls.

  3. compare_enrichment_modifiers: Checks for oligogenic combinations of rare genetic variants that are enriched in cases but not in controls where one of the items in a combination must be within an user-defined set of genes.

All these functions have the following required arguments:

  • boolean_input_df: A dataframe where rows are the number of samples and columns include sample ids (represented by the column name: "Sample_Name") along with one hot encoded information about the sample genotype (presence or absence rare deleterious mutation within a gene; these columns should start with the prefix "Input_") and phenotype (presence or absence of a phenotype; this column should start with the prefix "Output_"). Example dataframe is as follows:
Sample_Name Input_GeneA Input_GeneB Input_GeneC ... Output_phenotype
Sample_1111 0 1 1 ... 1
Sample_2198 0 1 0 ... 0
...
Sample_N 0 0 1 ... 0
  • combo_length: The number of items to mine within a combination.
  • min_indv_threshold: The minimum number of individuals to consider that must possess a combination before checking for enrichment.
  • max_freq_threshold: The maximum fraction of the cohort size that possess a combination (to filter out highly frequent combinations).

Along with the other required arguments, compare_enrichment_modifiers has an additional required argument:

  • primary_input_entities: List of genes that must be part of the enriched combinations

All these functions have the following optional arguments:

  • input_format: The prefix of the input columns in the boolean matrix; default="Input_"
  • output_format: The prefix of the output column in the boolean matrix; default="Output_"
  • pval_filter_threshold: The p-value significance threshold that the combinations must satisfy; default=0.05
  • adj_pval_type: The adjusted p-value method to run for multiple testing, one of bonferroni/BH; default="BH"
  • min_power_threshold: The minimum power threhsold that the significant combinations must satisfy; default=0.7
  • sample_names_ind: Add samples who possess each combo, one of "Y"/"N"; default="Y"
  • method: The frequent itemset mining method, one of "fpgrowth"/"apriori"; default="fpgrowth"

Usage examples

Please refer to the notebooks dir in repo.

Citation

  1. Pounraja VK, Girirajan S. A general framework for identifying oligogenic combinations of rare variants in complex disorders. Genome Res. 2022 May;32(5):904-915. doi: 10.1101/gr.276348.121. Epub 2022 Mar 17. PMID: 35301265; PMCID: PMC9104696.

Modifications in v0.1.0

Major

  1. Options between apriori and fpgrowth algorithms for frequent itemsets mining
  2. Refining control frequency step correctly added before running multiple testing

Minor

  1. After filter, raise ValueError check introduced if there is no data)
  2. Optional arguments bug fixed
  3. Better logging using a log file
  4. Method verbose during tree generation
  5. Pandas applymap changed to map due to deprecation warning
  6. Get counts helper function with pandas query fixed for hyphenated gene names
  7. No longer rounding off statistical values to 3 places of decimal

Possible modifications for v0.2.0

  1. Refining control frequencies step may not be required
  2. Create function for getting exp and obs prob for combos
  3. Create function for calculating p values
  4. Discuss the nominal significance filtration strategy
  5. Create multiple testing function
  6. Rounding adjusted p-values to 3 digits not a good idea
  7. compare enrichment modifiers why are we checking for primary entities only as consequents?

Internal use

Package creation

$ python3 -m pip install --upgrade pip
$ python3 -m pip install --upgrade build
$ python3 -m pip install --upgrade twine
$ python -m build
$ python3 -m twine upload --skip-existing dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrarecomb-0.1.4.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

pyrarecomb-0.1.4-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file pyrarecomb-0.1.4.tar.gz.

File metadata

  • Download URL: pyrarecomb-0.1.4.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pyrarecomb-0.1.4.tar.gz
Algorithm Hash digest
SHA256 4e88643a6704d7864e10cb3a5efbc33164fdd9a8a250b50055f38b6b924c4c78
MD5 0e96a079743b2324ba5864ce255baa44
BLAKE2b-256 14b94d886651b51920043c20c6df14ba1b521f9a760a5810250b9841f703b8be

See more details on using hashes here.

File details

Details for the file pyrarecomb-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pyrarecomb-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pyrarecomb-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4b198ad0305284bfa8490b63dae101893260c692d9990ae597adbe5e540fca87
MD5 16290a1435c4f7b786696310c4bd16d8
BLAKE2b-256 776a18f2e94fa2d114ab7707fcc07400f18edb35441aa7257d1f00739ac3109e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page