A CLI tool to help identify ibd sharing within networks across a locus of interest at biobank scale and then test for phenotypic enrichment within these networks.
Project description
IBDCluster v1.2.1:
Documentation:
This readme is a more technical description of the project, providing information about the class structures and relationships. More practical documentation about how to install and use the program can be found here: IBDCluster documentation (still a work in progress)
Purpose of the project:
This project is a cli tool that clusters shared ibd segments within biobanks around a gene of interest. These network are then analyzed to determine how many individuals within a network are affected by a phenotype of interest.
General PipeLine:
flowchart LR
A(IBD Information) --> B(Identified Networks) --> C(Binomial test for enrichment of Phenotypes)
installing:
Cloning from github and modify permissions:
- Clone the project into the appropriate directory using git clone.
- cd into the IBDCluster directory
cd IBDCluster
- run the following command to set the right permissions on the IBDCluster.py file
chmod +x IBDCluster/IBDCluster.py
Installing dependencies: Next install all the necessary dependencies. The steps for this vary depending on what package manager you are using.
If using conda:
- There is a environment.yml file in the main IBDCluster directory. Run the following command and it will create an environment called IBDCluster
conda env create -f environment.yml
- You can now activate the environment by calling:
conda activate IBDCluster
If using mamba:
- This is the same as the conda section except use the command
mamba env create -f environment.yml
- You can activate this environment using:
conda activate IBDCluster
If using Poetry
- The requirements for a poetry project are also in the IBDCluster directory. Ideally you need to activate some type of virtual environment first. This environment can be either a conda environment or a virtualenv. Once this environment is activated you can call:
poetry install
- At this point all necessary dependencies should be installed.
- if you wish to find more information about the project you can find the documentation here: https://python-poetry.org/
Adding IBDCluster to the users $PATH: To be able to run the IBDCluster program without having to be in the source code directory, you should add the IBDCluster.py file to your path.
- In your .bashrc file or .zshrc add the line :
export PATH="{Path to the directory that the program was cloned into}/IBDCluster/IBDCluster:$PATH"
- run this line:
source .bashrc
or
source .zshrc
This will allow you to run the code by just typing IBDCluster.py from any directory.
Running IBDCluster
- You can find all the optional parameters by running:
IBDCluster.py --help
Running the code:
Reporting Issues:
All issues can be reported using the templates in the .github/ folder. There are options for bug_reports and for feature_request
Technical Details of the project:
- This part is mainly for keeping track of the directory structure.
Project Structure:
├── IBDCluster
│ ├── analysis
│ │ ├── main.py
│ │ ├── percentages.py
│ ├── callbacks
│ │ ├── check_inputs.py
│ ├── models
│ │ ├── cluster_class.py
│ │ ├── indices.py
│ │ ├── pairs.py
│ │ ├── writers.py
│ ├── log
│ │ ├── logger.py
│ ├── cluster
│ │ ├── main.py
│ ├── IBDCluster.py
├── .env
├── environment.yml
├── .gitignore
├── poetry.lock
├── pyproject.toml
├── README.md
├── requirements.txt
│ ├── tests
│ │ ├── test_data
│ │ ├── test_integration
Comments about models:
- Classes for the cluster_class.py:
classDiagram
class Cluster {
ibd_file: str
ibd_program: str
indices: models.FileInfo
count: int=0
ibd_df: pd.DataFrame=pd.DataFrame
network_id: str=1
inds_in_network: Set[str]=set
network_list: List[Network]=list
}
class Network {
gene_name: str
gene_chr: str
network_id: int
pairs: List[Pairs]=list
iids: Set[str]=set
haplotypes: Set[str]=set
+filter_for_seed(ibd_df: pd.DataFrame, ind_seed: List[str], indices: FileInfo, exclusion: Set[str]=None) -> pd.DataFrame
#determine_pairs(ibd_row: pd.Series, indices: FileInfo) -> Pairs
+gather_grids(dataframe: pd.DataFrame, pair_1_indx: int, pair_2_indx: int) -> Set[str]
+update(ibd_df: pd.DataFrame, indices: FileInfo) -> None
}
class FileInfo {
<<interface>>
id1_indx: int
ind1_with_phase: int
id2_indx: int
ind2_with_phase: int
chr_indx: int
str_indx: int
end_indx: int
+set_program_indices(program_name: str) -> None
}
Cluster o-- Network
Entity relationships:
erDiagram
NETWORK }|--|{ PAIRS : contains
NETWORK {
string gene_name
string chromosome
int network_id
}
NETWORK }|--|{ IIDS : contains
NETWORK }|--|{ HAPLOTYPES : contains
PAIRS {
string pair_1_id
string pair_1_phase
string pair_2_id
string pair_2_phase
int chromosome_number
int segment_start
int segment_end
float length
series affected_statuses
}
IIDS {
string Individual-ids
}
HAPLOTYPES {
string haplotype-phase
}
Plugins: (all the plugins are classes)
NetworkWriter
classDiagram
class NetworkWriter {
gene_name: str
chromosome: str
carrier_cols: List[str]
#_form_header() -> str
#_find_min_phecode(analysis_dict: Dict) -> Tuple[str, str]
#_form_analysis_string(analysis_dict: Dict) -> str
+write(**kwargs) -> None
}
Work in Progress:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ibdcluster-1.2.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e15ec4caa1a017174049788ee89b4d6afa12eb48dfe65e00bb99fe4ba68ac26d |
|
MD5 | b3ab3ed13f5a12ed76c26fb13dcee0b4 |
|
BLAKE2b-256 | 7d48dca0273b2b8ce44852db25de55edca806e13c9264aedd7f02f05b59c2ae7 |