Skip to main content

No project description provided

Project description

sci-RegulatoryClusteringModel Mac M1 version

PyPI

If you want to read more about how SiRCle works, please check out our paper: https://www.biorxiv.org/content/10.1101/2022.07.02.498058v1

Note this is the version which doesn't contain the patient stratification using the VAE since tensorflow doesn't work on the M1 chip.

We will be looking into how we can make this work in the future.

Install

Optionally create a new conda env.

conda create --name scircle python=3.8
conda activate scircle
pip install sircle

Note on Windows

We have tested our code on Windows (10) and Mac (pro) computers, I'm not sure how it would go on a Windows 7 machine so if you have issues post an issue.

Note on libraries/dependenices

If you find that things don't install please let us know! We've done our best to make it reproducible but different environments may mess things up so we're happy to help you debug, just post an issue on the github.

Note we expect python 3.8 so if things don't work first time, check you're running python 3.8 and then try again :)

Run

See the examples folder for a proper tutorial with data included that you can run!

*If you have any troubles running the tutorial on a windows machine, please let us know and we can help to sort out any issues.

Quick version

from sircle import SciRCM
# FORMAT must be csv :) 
prot_file = f'path to the output from protein differential abundence file'
rna_file = f'path to the output from differential expression analysis file'
meth_file = f'path to the output from methylation DCpG analysis file'

# Note we assume your methylation CpGs map to a single gene, if they don't see the section below.
# logFC_rna = column name in your RNA file that has your RNA logFC (same for the protein and CpG)
# padj_rna = column name in your RNA file that has your padj value (same for protein and CpG)
# NOTE: these need to be unique from one another since we merge the datasets, if they aren't, you need
# to update your csv files.
# Lastly: ensembl_gene_id this is the gene ID column, All must use the same identifier, and this must be
# labelled the same in each file, if it isn't, update your column names before running.

rcm = SciRCM(meth_file, rna_file, prot_file, 
             "logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
             "ensembl_gene_id", sep=',',
             rna_padj_cutoff=0.05, 
             prot_padj_cutoff=0.05, 
             meth_padj_cutoff=0.05,
             rna_logfc_cutoff=1.0, 
             prot_logfc_cutoff=0.5, 
             meth_diff_cutoff=0.1, 
             output_dir='',
             non_coding_genes=['None'],
             output_filename='RCM_Output.csv',
             bg_type = '(P&M)|(P&R)|(M&R)'
         )
rcm.run()
df = rcm.get_df()
# That DF now has your rcm clustering results, how easy was that :D

Making your CpGs map to a single gene version

from scircm import filter_methylation_data_by_genes
meth_df = pd.read_csv(f'path to the output from methylation DCpG analysis file')
# Note: you need to pass it: 
# 1) the gene ID column, here it is 'ensembl_gene_id'
# 2) the padj column: here it is 'padj_meth'
# 3) the logFC or test statistic column: here it is 'CpG_Beta_diff'
filtered_meth_df = filter_methylation_data_by_genes(meth_df, 'ensembl_gene_id', 'padj_meth', 'CpG_Beta_diff')

Now you can run the first version :)

R version

First install Rtools if you haven't done this yet. There are different versions (windows: https://cran.r-project.org/bin/windows/Rtools/, macOS: https://cran.r-project.org/bin/macosx/tools/)

If you don't have conda, you'll need to do the below, first make sure you have reticulate installed.

#install.packages('BiocManager')
#BiocManager::install('basilisk')
library(basilisk)
# Set this to be the path to the example data we downloaded
data_dir <- '../data_example/'

protFile <- paste0(data_dir, 'prot_DE_Stage IV_sircle.csv')
rnaFile <- paste0(data_dir, 'rna_DE_Stage IV_sircle_renamed-cols.csv')
methFile <- paste0(data_dir, 'filtered_cpg_DE_Stage IV_sircle.csv')
# Note if you use gene names here you would need to change this to be the column that has the gene ID in it
geneId <- 'ensembl_gene_id'

sircleFileName <- paste0(data_dir, "SircleR-RCM.csv")

# Use basilisk to create an environment we can use
bas_scircm <- BasiliskEnvironment(envname="simple_sircle",
                                   pkgname="sircle",
                                   packages=c("numpy==1.20"),
                                  pip=c("sircle")
)

#logFC_rna = column name in your RNA file that has your RNA logFC (same for the protein and CpG)
#padj_rna = column name in your RNA file that has your padj value (same for protein and CpG)
#NOTE: these need to be unique from one another since we merge the datasets, if they aren't, you need
#to update your csv files.
#Lastly: ensembl_gene_id this is the gene ID column, All must use the same identifier, and this must be
#labelled the same in each file, if it isn't, update your column names before running.
res <- basiliskRun(env=bas_scircm, fun=function(args) {
    rcm <- sircleRCM(rnaFile, methFile, protFile, geneId,  "logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
                 outputFileName = sircleFileName, 
                 envName="simple_sircle")
    # Do something with pandas
    return(rcm)
})


# Plot the sircle function
sirclePlot(sircleFileName, regLabels="Regulation_Grouping_2") 

# Note you need to have the entrez gene ID added to your csv file
# Run ORA on the groups
sircleORAHuman(sircleFileName, "entrezgene_id", "Regulation_Grouping_2")

Regulatory clustering model

The general table of how we define regulatory clusters.

Methylation RNAseq Proteomics Regulation driver_1 Regulation driver_2 Regulation_Grouping1 Regulation_Grouping2 Regulation_Grouping3
Hypermethylation DOWN DOWN Methylation increase (MDS) None MDS MDS MDS
Hypermethylation UP DOWN mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
Hypermethylation UP UP mRNA increase (TPDE) None TPDE TPDE TPDE
Hypermethylation DOWN UP Methylation increase (MDS) Protein increase (TMDE) MDS+TMDE TMDE TMDE
Hypermethylation No Change UP mRNA increase (TPDE) Protein increase (TMDE) TPDE+TMDE TMDE TMDE
Hypermethylation No Change DOWN mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TMDS TMDS
Hypermethylation UP No Change mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
Hypermethylation DOWN No Change Methylation increase (MDS) Protein increase (TMDE) MDS+TMDE MDS+TMDE TMDE
Hypermethylation No Change No Change Methylation increase (ncRNA) None MDS-ncRNA MDS_ncRNA MDS_ncRNA
Hypomethylation DOWN DOWN mRNA decrease (TPDS) None TPDS TPDS TPDS
Hypomethylation UP DOWN Methylation decrease (MDE) Protein decrease (TMDS) MDE+TMDS TMDS TMDS
Hypomethylation UP UP Methylation decrease (MDE) None MDE MDE MDE
Hypomethylation DOWN UP mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
Hypomethylation No Change UP mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TMDE TMDE
Hypomethylation No Change DOWN mRNA decrease (TPDS) Protein decrease (TMDS) TPDS+TMDS TMDS TMDS
Hypomethylation UP No Change Methylation decrease (MDE) Protein decrease (TMDS) MDE+TMDS MDE+TMDS TMDS
Hypomethylation DOWN No Change mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
Hypomethylation No Change No Change Methylation decrease (ncRNA) None MDE+ncRNA MDE_ncRNA MDE_ncRNA
No Change DOWN UP mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
No Change UP DOWN mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
No Change DOWN DOWN mRNA decrease (TPDS) None TPDS TPDS TPDS
No Change UP UP mRNA increase (TPDE) None TPDE TPDE TPDE
No Change No Change UP Protein increase (TMDE) None TMDE TMDE TMDE
No Change No Change DOWN Protein decrease (TMDS) None TMDS TMDS TMDS
No Change UP No Change mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
No Change DOWN No Change mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
No Change No Change No Change NoChange NoChange NoChange NoChange NoChange

Please post questions and issues related to sci-rcm on the Issues <https://github.com/ArianeMora/scircm/issues>_ section of the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sircle-0.0.4.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sircle-0.0.4-py3-none-any.whl (74.3 kB view details)

Uploaded Python 3

File details

Details for the file sircle-0.0.4.tar.gz.

File metadata

  • Download URL: sircle-0.0.4.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for sircle-0.0.4.tar.gz
Algorithm Hash digest
SHA256 9f93dd52269ebb8daa13e0dfd90fa0b82752ff8aecd4d69a154067372c9cf1e6
MD5 340c693b47789e6c36276b255fbb2a9e
BLAKE2b-256 8289408e4b0a5f689adc4a4bc33106a54fd53124fae2f42545e984c4b5b0d50b

See more details on using hashes here.

File details

Details for the file sircle-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: sircle-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 74.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for sircle-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d816f05b970746ad8fe1cde52ca58647c22d10e9a26a2c4d3f6eeb84c243d076
MD5 dea43dff7afbd424eb652aafd7cc923f
BLAKE2b-256 2e6908176bc7dcdf2e810c7f25905dfe36a8c845edbae7162285a20d3531a3b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page