Skip to main content

No project description provided

Project description

sci-RegulatoryClusteringModel Mac M1 version

PyPI

If you want to read more about how SiRCle works, please check out our paper: https://www.biorxiv.org/content/10.1101/2022.07.02.498058v1

Note this is the version which doesn't contain the patient stratification using the VAE since tensorflow doesn't work on the M1 chip.

We will be looking into how we can make this work in the future.

Install

Optionally create a new conda env.

conda create --name scircle python=3.8
conda activate scircle
pip install sircle

Note on Windows

We have tested our code on Windows (10) and Mac (pro) computers, I'm not sure how it would go on a Windows 7 machine so if you have issues post an issue.

Note on libraries/dependenices

If you find that things don't install please let us know! We've done our best to make it reproducible but different environments may mess things up so we're happy to help you debug, just post an issue on the github.

Note we expect python 3.8 so if things don't work first time, check you're running python 3.8 and then try again :)

Run

See the examples folder for a proper tutorial with data included that you can run!

*If you have any troubles running the tutorial on a windows machine, please let us know and we can help to sort out any issues.

Quick version

from sircle import SciRCM
# FORMAT must be csv :) 
prot_file = f'path to the output from protein differential abundence file'
rna_file = f'path to the output from differential expression analysis file'
meth_file = f'path to the output from methylation DCpG analysis file'

# Note we assume your methylation CpGs map to a single gene, if they don't see the section below.
# logFC_rna = column name in your RNA file that has your RNA logFC (same for the protein and CpG)
# padj_rna = column name in your RNA file that has your padj value (same for protein and CpG)
# NOTE: these need to be unique from one another since we merge the datasets, if they aren't, you need
# to update your csv files.
# Lastly: ensembl_gene_id this is the gene ID column, All must use the same identifier, and this must be
# labelled the same in each file, if it isn't, update your column names before running.

rcm = SciRCM(meth_file, rna_file, prot_file, 
             "logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
             "ensembl_gene_id", sep=',',
             rna_padj_cutoff=0.05, 
             prot_padj_cutoff=0.05, 
             meth_padj_cutoff=0.05,
             rna_logfc_cutoff=1.0, 
             prot_logfc_cutoff=0.5, 
             meth_diff_cutoff=0.1, 
             output_dir='',
             non_coding_genes=['None'],
             output_filename='RCM_Output.csv',
             bg_type = '(P&M)|(P&R)|(M&R)'
         )
rcm.run()
df = rcm.get_df()
# That DF now has your rcm clustering results, how easy was that :D

Making your CpGs map to a single gene version

from scircm import filter_methylation_data_by_genes
meth_df = pd.read_csv(f'path to the output from methylation DCpG analysis file')
# Note: you need to pass it: 
# 1) the gene ID column, here it is 'ensembl_gene_id'
# 2) the padj column: here it is 'padj_meth'
# 3) the logFC or test statistic column: here it is 'CpG_Beta_diff'
filtered_meth_df = filter_methylation_data_by_genes(meth_df, 'ensembl_gene_id', 'padj_meth', 'CpG_Beta_diff')

Now you can run the first version :)

R version

First install Rtools if you haven't done this yet. There are different versions (windows: https://cran.r-project.org/bin/windows/Rtools/, macOS: https://cran.r-project.org/bin/macosx/tools/)

If you don't have conda, you'll need to do the below, first make sure you have reticulate installed.

#install.packages('BiocManager')
#BiocManager::install('basilisk')
library(basilisk)
# Set this to be the path to the example data we downloaded
data_dir <- '../data_example/'

protFile <- paste0(data_dir, 'prot_DE_Stage IV_sircle.csv')
rnaFile <- paste0(data_dir, 'rna_DE_Stage IV_sircle_renamed-cols.csv')
methFile <- paste0(data_dir, 'filtered_cpg_DE_Stage IV_sircle.csv')
# Note if you use gene names here you would need to change this to be the column that has the gene ID in it
geneId <- 'ensembl_gene_id'

sircleFileName <- paste0(data_dir, "SircleR-RCM.csv")

# Use basilisk to create an environment we can use
bas_scircm <- BasiliskEnvironment(envname="simple_sircle",
                                   pkgname="sircle",
                                   packages=c("numpy==1.20"),
                                  pip=c("sircle")
)

#logFC_rna = column name in your RNA file that has your RNA logFC (same for the protein and CpG)
#padj_rna = column name in your RNA file that has your padj value (same for protein and CpG)
#NOTE: these need to be unique from one another since we merge the datasets, if they aren't, you need
#to update your csv files.
#Lastly: ensembl_gene_id this is the gene ID column, All must use the same identifier, and this must be
#labelled the same in each file, if it isn't, update your column names before running.
res <- basiliskRun(env=bas_scircm, fun=function(args) {
    rcm <- sircleRCM(rnaFile, methFile, protFile, geneId,  "logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
                 outputFileName = sircleFileName, 
                 envName="simple_sircle")
    # Do something with pandas
    return(rcm)
})


# Plot the sircle function
sirclePlot(sircleFileName, regLabels="Regulation_Grouping_2") 

# Note you need to have the entrez gene ID added to your csv file
# Run ORA on the groups
sircleORAHuman(sircleFileName, "entrezgene_id", "Regulation_Grouping_2")

Regulatory clustering model

The general table of how we define regulatory clusters.

Methylation RNAseq Proteomics Regulation driver_1 Regulation driver_2 Regulation_Grouping1 Regulation_Grouping2 Regulation_Grouping3
Hypermethylation DOWN DOWN Methylation increase (MDS) None MDS MDS MDS
Hypermethylation UP DOWN mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
Hypermethylation UP UP mRNA increase (TPDE) None TPDE TPDE TPDE
Hypermethylation DOWN UP Methylation increase (MDS) Protein increase (TMDE) MDS+TMDE TMDE TMDE
Hypermethylation No Change UP mRNA increase (TPDE) Protein increase (TMDE) TPDE+TMDE TMDE TMDE
Hypermethylation No Change DOWN mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TMDS TMDS
Hypermethylation UP No Change mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
Hypermethylation DOWN No Change Methylation increase (MDS) Protein increase (TMDE) MDS+TMDE MDS+TMDE TMDE
Hypermethylation No Change No Change Methylation increase (ncRNA) None MDS-ncRNA MDS_ncRNA MDS_ncRNA
Hypomethylation DOWN DOWN mRNA decrease (TPDS) None TPDS TPDS TPDS
Hypomethylation UP DOWN Methylation decrease (MDE) Protein decrease (TMDS) MDE+TMDS TMDS TMDS
Hypomethylation UP UP Methylation decrease (MDE) None MDE MDE MDE
Hypomethylation DOWN UP mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
Hypomethylation No Change UP mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TMDE TMDE
Hypomethylation No Change DOWN mRNA decrease (TPDS) Protein decrease (TMDS) TPDS+TMDS TMDS TMDS
Hypomethylation UP No Change Methylation decrease (MDE) Protein decrease (TMDS) MDE+TMDS MDE+TMDS TMDS
Hypomethylation DOWN No Change mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
Hypomethylation No Change No Change Methylation decrease (ncRNA) None MDE+ncRNA MDE_ncRNA MDE_ncRNA
No Change DOWN UP mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
No Change UP DOWN mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
No Change DOWN DOWN mRNA decrease (TPDS) None TPDS TPDS TPDS
No Change UP UP mRNA increase (TPDE) None TPDE TPDE TPDE
No Change No Change UP Protein increase (TMDE) None TMDE TMDE TMDE
No Change No Change DOWN Protein decrease (TMDS) None TMDS TMDS TMDS
No Change UP No Change mRNA increase (TPDE) Protein decrease (TMDS) TPDE+TMDS TPDE+TMDS TMDS
No Change DOWN No Change mRNA decrease (TPDS) Protein increase (TMDE) TPDS+TMDE TPDS+TMDE TMDE
No Change No Change No Change NoChange NoChange NoChange NoChange NoChange

Please post questions and issues related to sci-rcm on the Issues <https://github.com/ArianeMora/scircm/issues>_ section of the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sircle-0.0.5.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sircle-0.0.5-py3-none-any.whl (74.5 kB view details)

Uploaded Python 3

File details

Details for the file sircle-0.0.5.tar.gz.

File metadata

  • Download URL: sircle-0.0.5.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for sircle-0.0.5.tar.gz
Algorithm Hash digest
SHA256 d5fa22e14db3b2399129c82f0c515a989d001c54cf3aad341f375a8faa942384
MD5 68922a348dce230307944eecbd4706b8
BLAKE2b-256 b1bb276eee7dfa40a186b85c62aea820f1a6c86a06f8d4912975bcd026610817

See more details on using hashes here.

File details

Details for the file sircle-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: sircle-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 74.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for sircle-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c217b29d95b9ae3ea1746f8d4845f0ec53b28f86e83739230c27886a0e783018
MD5 33827832a198e0fad15d6b409e4a8725
BLAKE2b-256 0a2f89111a79c0064daccbd00eb5a509bae3c4b4fa03c6d0410231ca58df6b20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page