No project description provided
Project description
sci-RegulatoryClusteringModel
Python version
This is all for the python version of our package, if you are interested in the R version please go here: https://github.com/ArianeMora/SiRCleR
If you want to read more about how SiRCle works, please check out our paper: https://www.biorxiv.org/content/10.1101/2022.07.02.498058v1
Install
Optionally create a new conda env.
conda create --name scircm python=3.8
conda activate scircm
pip install scircm
Note on Mac M1
Unfortunately, the Mac M1 has some issues with tensorflow, so we have not been able to get our package to work on a M1 mac yet. If you get it working please let us know your solution as others may benefit!
Note on Windows
We have tested our code on Windows (10) and Mac (pro) computers, I'm not sure how it would go on a Windows 7 machine so if you have issues post an issue.
Note on libraries/dependenices
If you find that things don't install please let us know! We've done our best to make it reproducible but different environments may mess things up so we're happy to help you debug, just post an issue on the github.
Note we expect python 3.8 so if things don't work first time, check you're running python 3.8 and then try again :)
Run
See the examples folder for a proper tutorial with data included that you can run!
*If you have any troubles running the tutorial on a windows machine, please let us know and we can help to sort out any issues.
Quick version
from scircm import SciRCM
# FORMAT must be csv :)
prot_file = f'path to the output from protein differential abundence file'
rna_file = f'path to the output from differential expression analysis file'
meth_file = f'path to the output from methylation DCpG analysis file'
# Note we assume your methylation CpGs map to a single gene, if they don't see the section below.
# logFC_rna = column name in your RNA file that has your RNA logFC (same for the protein and CpG)
# padj_rna = column name in your RNA file that has your padj value (same for protein and CpG)
# NOTE: these need to be unique from one another since we merge the datasets, if they aren't, you need
# to update your csv files.
# Lastly: ensembl_gene_id this is the gene ID column, All must use the same identifier, and this must be
# labelled the same in each file, if it isn't, update your column names before running.
rcm = SciRCM(meth_file, rna_file, prot_file,
"logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
"ensembl_gene_id", sep=',',
rna_padj_cutoff=0.05,
prot_padj_cutoff=0.05,
meth_padj_cutoff=0.05,
rna_logfc_cutoff=1.0,
prot_logfc_cutoff=0.5,
meth_diff_cutoff=0.1,
output_dir='',
non_coding_genes=['None'],
output_filename='RCM_Output.csv',
bg_type = '(P&M)|(P&R)|(M&R)'
)
rcm.run()
df = rcm.get_df()
# That DF now has your rcm clustering results, how easy was that :D
Making your CpGs map to a single gene version
from scircm import filter_methylation_data_by_genes
meth_df = pd.read_csv(f'path to the output from methylation DCpG analysis file')
# Note: you need to pass it:
# 1) the gene ID column, here it is 'ensembl_gene_id'
# 2) the padj column: here it is 'padj_meth'
# 3) the logFC or test statistic column: here it is 'CpG_Beta_diff'
filtered_meth_df = filter_methylation_data_by_genes(meth_df, 'ensembl_gene_id', 'padj_meth', 'CpG_Beta_diff')
Now you can run the first version :)
R version
First install Rtools if you haven't done this yet. There are different versions (windows: https://cran.r-project.org/bin/windows/Rtools/, macOS: https://cran.r-project.org/bin/macosx/tools/)
If you don't have conda, you'll need to do the below, first make sure you have reticulate installed.
install.packages('reticulate')
Create a new environment and install scircm.
virtualenv_create(
envname = "ml",
python = NULL,
packages = "scircm",
system_site_packages = getOption("reticulate.virtualenv.system_site_packages",
default = FALSE)
)
Then run the following script!
library(tidyverse) # install these if you don't have them
library(dplyr)
library(reticulate)
# If things fail here it's because you need to the steps above
use_condaenv("ml", required = TRUE) # OR use_virtualenv("ml", required = TRUE) # depending on how you installed it!
scircm <<- import("scircm") # Make global
prot_file <- 'path to the output from protein differential abundence file'
rna_file <- 'path to the output from differential expression analysis file'
meth_file <- 'path to the output from methylation DCpG analysis file'
rcm = scircm$SciRCM(meth_file, rna_file, prot_file,
"logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
"ensembl_gene_id", sep=',',
rna_padj_cutoff=0.05,
prot_padj_cutoff=0.05,
meth_padj_cutoff=0.05,
rna_logfc_cutoff=1.0,
prot_logfc_cutoff=0.5,
meth_diff_cutoff=0.1,
output_dir='',
non_coding_genes=['None'],
output_filename='RCM_Output.csv',
bg_type = '(P&M)|(P&R)|(M&R)'
)
rcm$run()
df <- rcm$get_df()
Regulatory clustering model
The general table of how we define regulatory clusters.
Methylation | RNAseq | Proteomics | Regulation driver_1 | Regulation driver_2 | Regulation_Grouping1 | Regulation_Grouping2 | Regulation_Grouping3 |
---|---|---|---|---|---|---|---|
Hypermethylation | DOWN | DOWN | Methylation increase (MDS) | None | MDS | MDS | MDS |
Hypermethylation | UP | DOWN | mRNA increase (TPDE) | Protein decrease (TMDS) | TPDE+TMDS | TPDE+TMDS | TMDS |
Hypermethylation | UP | UP | mRNA increase (TPDE) | None | TPDE | TPDE | TPDE |
Hypermethylation | DOWN | UP | Methylation increase (MDS) | Protein increase (TMDE) | MDS+TMDE | TMDE | TMDE |
Hypermethylation | No Change | UP | mRNA increase (TPDE) | Protein increase (TMDE) | TPDE+TMDE | TMDE | TMDE |
Hypermethylation | No Change | DOWN | mRNA increase (TPDE) | Protein decrease (TMDS) | TPDE+TMDS | TMDS | TMDS |
Hypermethylation | UP | No Change | mRNA increase (TPDE) | Protein decrease (TMDS) | TPDE+TMDS | TPDE+TMDS | TMDS |
Hypermethylation | DOWN | No Change | Methylation increase (MDS) | Protein increase (TMDE) | MDS+TMDE | MDS+TMDE | TMDE |
Hypermethylation | No Change | No Change | Methylation increase (ncRNA) | None | MDS-ncRNA | MDS_ncRNA | MDS_ncRNA |
Hypomethylation | DOWN | DOWN | mRNA decrease (TPDS) | None | TPDS | TPDS | TPDS |
Hypomethylation | UP | DOWN | Methylation decrease (MDE) | Protein decrease (TMDS) | MDE+TMDS | TMDS | TMDS |
Hypomethylation | UP | UP | Methylation decrease (MDE) | None | MDE | MDE | MDE |
Hypomethylation | DOWN | UP | mRNA decrease (TPDS) | Protein increase (TMDE) | TPDS+TMDE | TPDS+TMDE | TMDE |
Hypomethylation | No Change | UP | mRNA decrease (TPDS) | Protein increase (TMDE) | TPDS+TMDE | TMDE | TMDE |
Hypomethylation | No Change | DOWN | mRNA decrease (TPDS) | Protein decrease (TMDS) | TPDS+TMDS | TMDS | TMDS |
Hypomethylation | UP | No Change | Methylation decrease (MDE) | Protein decrease (TMDS) | MDE+TMDS | MDE+TMDS | TMDS |
Hypomethylation | DOWN | No Change | mRNA decrease (TPDS) | Protein increase (TMDE) | TPDS+TMDE | TPDS+TMDE | TMDE |
Hypomethylation | No Change | No Change | Methylation decrease (ncRNA) | None | MDE+ncRNA | MDE_ncRNA | MDE_ncRNA |
No Change | DOWN | UP | mRNA decrease (TPDS) | Protein increase (TMDE) | TPDS+TMDE | TPDS+TMDE | TMDE |
No Change | UP | DOWN | mRNA increase (TPDE) | Protein decrease (TMDS) | TPDE+TMDS | TPDE+TMDS | TMDS |
No Change | DOWN | DOWN | mRNA decrease (TPDS) | None | TPDS | TPDS | TPDS |
No Change | UP | UP | mRNA increase (TPDE) | None | TPDE | TPDE | TPDE |
No Change | No Change | UP | Protein increase (TMDE) | None | TMDE | TMDE | TMDE |
No Change | No Change | DOWN | Protein decrease (TMDS) | None | TMDS | TMDS | TMDS |
No Change | UP | No Change | mRNA increase (TPDE) | Protein decrease (TMDS) | TPDE+TMDS | TPDE+TMDS | TMDS |
No Change | DOWN | No Change | mRNA decrease (TPDS) | Protein increase (TMDE) | TPDS+TMDE | TPDS+TMDE | TMDE |
No Change | No Change | No Change | NoChange | NoChange | NoChange | NoChange | NoChange |
Please post questions and issues related to sci-rcm on the Issues <https://github.com/ArianeMora/scircm/issues>
_ section of the GitHub repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scircm-1.0.3.tar.gz
.
File metadata
- Download URL: scircm-1.0.3.tar.gz
- Upload date:
- Size: 45.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 896e187c0752432369192118000a4df1ae5e62c885ca15b559dc1b3c5a21db77 |
|
MD5 | f52f9ec3f129fde10b2865219af363f7 |
|
BLAKE2b-256 | 27e6b62ea10c65eee81fab5065784655cff0ba64759df92fda664a6846ae21a2 |
File details
Details for the file scircm-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: scircm-1.0.3-py3-none-any.whl
- Upload date:
- Size: 68.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 971a4e00c68a9c389a6142b59755534acd4a881188585a852453ba3ba89a439f |
|
MD5 | 2057e549e8ad0b25d5f986f77ea70824 |
|
BLAKE2b-256 | 42c137c7dbbf6f894d7d3ab245e0f43654cf82d4aef1e66558e6db52d7d07d2e |