Skip to main content

A CLI tool to help identify ibd sharing within networks across a locus of interest at biobank scale and then test for phenotypic enrichment within these networks.

Project description

Code style: black

IBDCluster v1.2.1:

Documentation:


This readme is a more technical description of the project, providing information about the class structures and relationships. More practical documentation about how to install and use the program can be found here: IBDCluster documentation (still a work in progress)

Purpose of the project:


This project is a cli tool that clusters shared ibd segments within biobanks around a gene of interest. These network are then analyzed to determine how many individuals within a network are affected by a phenotype of interest.

General PipeLine:


flowchart LR
    A(IBD Information) --> B(Identified Networks) --> C(Binomial test for enrichment of Phenotypes)

installing:


Cloning from github and modify permissions:

  1. Clone the project into the appropriate directory using git clone.
  2. cd into the IBDCluster directory
cd IBDCluster
  1. run the following command to set the right permissions on the IBDCluster.py file
chmod +x IBDCluster/IBDCluster.py

Installing dependencies: Next install all the necessary dependencies. The steps for this vary depending on what package manager you are using.

If using conda:

  1. There is a environment.yml file in the main IBDCluster directory. Run the following command and it will create an environment called IBDCluster
conda env create -f environment.yml
  1. You can now activate the environment by calling:
conda activate IBDCluster

If using mamba:

  1. This is the same as the conda section except use the command
mamba env create -f environment.yml
  1. You can activate this environment using:
conda activate IBDCluster

If using Poetry

  1. The requirements for a poetry project are also in the IBDCluster directory. Ideally you need to activate some type of virtual environment first. This environment can be either a conda environment or a virtualenv. Once this environment is activated you can call:
poetry install
  1. At this point all necessary dependencies should be installed.

Adding IBDCluster to the users $PATH: To be able to run the IBDCluster program without having to be in the source code directory, you should add the IBDCluster.py file to your path.

  1. In your .bashrc file or .zshrc add the line :
export PATH="{Path to the directory that the program was cloned into}/IBDCluster/IBDCluster:$PATH"
  1. run this line:
source .bashrc

or

source .zshrc

This will allow you to run the code by just typing IBDCluster.py from any directory.

Running IBDCluster

  • You can find all the optional parameters by running:
IBDCluster.py --help

Running the code:


Reporting Issues:


All issues can be reported using the templates in the .github/ folder. There are options for bug_reports and for feature_request

Technical Details of the project:


  • This part is mainly for keeping track of the directory structure.

Project Structure:


├── IBDCluster
│   ├── analysis
│   │   ├── main.py
│   │   ├── percentages.py
│   ├── callbacks
│   │   ├── check_inputs.py
│   ├── models
│   │   ├── cluster_class.py
│   │   ├── indices.py
│   │   ├── pairs.py
│   │   ├── writers.py
│   ├── log
│   │   ├── logger.py
│   ├── cluster
│   │   ├── main.py
│   ├── IBDCluster.py
├── .env
├── environment.yml
├── .gitignore
├── poetry.lock
├── pyproject.toml
├── README.md
├── requirements.txt
│   ├── tests
│   │   ├── test_data
│   │   ├── test_integration

Comments about models:


  • Classes for the cluster_class.py:
classDiagram
    class Cluster {
        ibd_file: str
        ibd_program: str
        indices: models.FileInfo
        count: int=0
        ibd_df: pd.DataFrame=pd.DataFrame
        network_id: str=1
        inds_in_network: Set[str]=set
        network_list: List[Network]=list
    }
    class Network {
        gene_name: str
        gene_chr: str
        network_id: int
        pairs: List[Pairs]=list
        iids: Set[str]=set
        haplotypes: Set[str]=set
        +filter_for_seed(ibd_df: pd.DataFrame, ind_seed: List[str], indices: FileInfo, exclusion: Set[str]=None) -> pd.DataFrame
        #determine_pairs(ibd_row: pd.Series, indices: FileInfo) -> Pairs
        +gather_grids(dataframe: pd.DataFrame, pair_1_indx: int, pair_2_indx: int) -> Set[str]
        +update(ibd_df: pd.DataFrame, indices: FileInfo) -> None
    }
    class FileInfo {
        <<interface>>
        id1_indx: int
        ind1_with_phase: int
        id2_indx: int
        ind2_with_phase: int
        chr_indx: int
        str_indx: int
        end_indx: int
        +set_program_indices(program_name: str) -> None
    }
    Cluster o-- Network

Entity relationships:


erDiagram
    NETWORK }|--|{ PAIRS : contains
    NETWORK {
        string gene_name
        string chromosome
        int network_id
    }
    NETWORK }|--|{ IIDS : contains
    NETWORK }|--|{ HAPLOTYPES : contains
    PAIRS {
       string pair_1_id
       string pair_1_phase 
       string pair_2_id
       string pair_2_phase 
       int chromosome_number
       int segment_start 
       int segment_end
       float length 
       series affected_statuses 
    }
    IIDS {
        string Individual-ids
    }
    HAPLOTYPES {
        string haplotype-phase
    }

Plugins: (all the plugins are classes)


NetworkWriter

classDiagram
    class NetworkWriter {
        gene_name: str
        chromosome: str
        carrier_cols: List[str]
        #_form_header() -> str
        #_find_min_phecode(analysis_dict: Dict) -> Tuple[str, str]
        #_form_analysis_string(analysis_dict: Dict) -> str
        +write(**kwargs) -> None

    }

Work in Progress:


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibdcluster-1.2.9.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

ibdcluster-1.2.9-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file ibdcluster-1.2.9.tar.gz.

File metadata

  • Download URL: ibdcluster-1.2.9.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.7 Linux/5.4.0-135-generic

File hashes

Hashes for ibdcluster-1.2.9.tar.gz
Algorithm Hash digest
SHA256 547717a8f696f55a22cb8f6a3a1e01af398f9d5570440eb2e0aa875176960dd0
MD5 f96cac5febb005c99fa24812c22edea6
BLAKE2b-256 20773d268d74b8e412a2e67b1a510fe1e239270ad76308962cc36df13cfc1e75

See more details on using hashes here.

File details

Details for the file ibdcluster-1.2.9-py3-none-any.whl.

File metadata

  • Download URL: ibdcluster-1.2.9-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.7 Linux/5.4.0-135-generic

File hashes

Hashes for ibdcluster-1.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 e15ec4caa1a017174049788ee89b4d6afa12eb48dfe65e00bb99fe4ba68ac26d
MD5 b3ab3ed13f5a12ed76c26fb13dcee0b4
BLAKE2b-256 7d48dca0273b2b8ce44852db25de55edca806e13c9264aedd7f02f05b59c2ae7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page