No project description provided

These details have not been verified by PyPI

Project links

Project description

sci-RegulatoryClusteringModel

Versions:

Install

Optionally create a new conda env.

conda create --name scircm python=3.8
conda activate scircm
pip install scircm

Run

See the examples! folder for a proper tutorial with data included that you can run!

*If you have any troubles running the tutorial on a windows machine, please let us know and we can help to sort out any issues.

Issues

Please let us know if you have any issues (ariane.n.mora@gmail.com) or via the issues! tab (preferred).

Quick version

from scircm import SciRCM
# FORMAT must be csv :) 
prot_file = f'path to the output from protein differential abundence file'
rna_file = f'path to the output from differential expression analysis file'
meth_file = f'path to the output from methylation DCpG analysis file'

# Note we assume your methylation CpGs map to a single gene, if they don't see the section below.
# logFC_rna = column name in your RNA file that has your RNA logFC (same for the protein and CpG)
# padj_rna = column name in your RNA file that has your padj value (same for protein and CpG)
# NOTE: these need to be unique from one another since we merge the datasets, if they aren't, you need
# to update your csv files.
# Lastly: ensembl_gene_id this is the gene ID column, All must use the same identifier, and this must be
# labelled the same in each file, if it isn't, update your column names before running.

rcm = SciRCM(meth_file, rna_file, prot_file, 
             "logFC_rna", "padj_rna", "CpG_Beta_diff", "padj_meth", "logFC_protein", "padj_protein",
             "ensembl_gene_id", sep=',',
             rna_padj_cutoff=0.05, 
             prot_padj_cutoff=0.05, 
             meth_padj_cutoff=0.05,
             rna_logfc_cutoff=1.0, 
             prot_logfc_cutoff=0.5, 
             meth_diff_cutoff=0.1, 
             output_dir='',
             non_coding_genes=['None'],
             output_filename='RCM_Output.csv',
             bg_type = '(P&M)|(P&R)|(M&R)'
         )
rcm.run()
df = rcm.get_df()
# That DF now has your rcm clustering results, how easy was that :D

Making your CpGs map to a single gene version

from scircm import filter_methylation_data_by_genes
meth_df = pd.read_csv(f'path to the output from methylation DCpG analysis file')
# Note: you need to pass it: 
# 1) the gene ID column, here it is 'ensembl_gene_id'
# 2) the padj column: here it is 'padj_meth'
# 3) the logFC or test statistic column: here it is 'CpG_Beta_diff'
filtered_meth_df = filter_methylation_data_by_genes(meth_df, 'ensembl_gene_id', 'padj_meth', 'CpG_Beta_diff')

Manuscript

If you want to read more about how SiRCle works, please check out our paper: https://www.biorxiv.org/content/10.1101/2022.07.02.498058v1

Website is available at: https://arianemora-sircle-web-app-ndu996.streamlit.app it is under active development and currently in Beta mode - let us know if you have any issues :) Note the website is only for the first bit of the regulatory clustering.

Now you can run the first version :)

Quick guide on the regulation grouping levels

We have four levels of regulation grouping, each encoding a different level of clustering, the excel spreadsheet in examples has all these written explicitly. We include four levels so that depending on the experiment types and specific data used (e.g. type of proteomics) the user can choose the right granularity.

Regulation Grouping All: All 54 levels where the labels are just the level of DNA methylation, RNA and protein.
Regulation Grouping Change: 54 groups consolidated to 10 such that we prioritise "changes" in the system. For example, an "UP" on RNA followed by "Not significant" or not meeting the threshold will consider that a "double regulation", namely, an "increase" on the RNA level followed by a suppresion on the translational level (because it didn't meet the threshold).
Regulation Grouping Detection: 54 groups consolidated to 10 such that we prioritise the central dogma and "stop" the labelling at the last significant change. For example, the above example ("UP" on RNA followed by "NS" on protein) would just be Transcription driven increase (TPDE).
Regulation Grouping Protein: 54 groups consolidated to 6 clusters, such that we only consider one layer of regulation (no double groups) and that if there are two opposing changes (for example "UP" on the transcriptional layer followed by "DOWN" on the protein, it would be labelled via the "DOWN" on the protein layer (i.e. TMDS)

Quick guide on the background filter

To compensate for detection thresholds (e.g. the protein layer having much less than RNA or Methylation) we include a background filter that "Nones" labels that don't meet the threshold.

The background options included are:

'P&R': Protein and RNA exist for that gene, this is the recommended level of background as it avoids confusion with detection limits.
'(P&M)|(P&R)|(M&R)': Protein and DNA methylation, or, Protein and RNA, or Methylation and RNA exist in the datasets for that gene. This is recommended for if you are interested in layers that may not have protein, for example including non-coding genes or have poor detection on the protein level.
'*': No filter
'P&M&R': The strictest one, the gene must have Protein and RNA and DNA methylation (this one is only recomended if you really want the impact of DNA methylation to be the focus)
'P|M|R': Akin to "*" unless an added statistical filter is added (see footnote)
'(P&M)|(P&R)': Protein and Methylation or protein and RNA.
'P|(M&R)': Protein or RNA and Methylation exist.
'P|R': Protein or RNA exists in the dataset.

Note, when running the function run_rcm(self, methylation_background=1.0, rna_background=1.0, protein_background=1.0) you can pass different thresholds to the background filter, the current filter is just that the p.adj exists (i.e. is not None), however, one could pass 0.05 if one wanted the background to only consider genes with a significant change.

Issues

Note on Mac M1

Unfortunately, the Mac M1 has some issues with tensorflow, so we have not been able to get our package to work on a M1 mac yet. If you get it working please let us know your solution as others may benefit! We have a partial M1 version (that covers the regulatory clustering) but doesn't enable the group statistics, if you are only interested in part 1 of SiRCle please go here: https://github.com/ArianeMora/SiRCle_M1

Note on Windows

We have tested our code on Windows (10) and Mac (pro) computers, I'm not sure how it would go on a Windows 7 machine so if you have issues post an issue.

Note on libraries/dependenices

If you find that things don't install please let us know! We've done our best to make it reproducible but different environments may mess things up so we're happy to help you debug, just post an issue on the github.

Note we expect python 3.8 so if things don't work first time, check you're running python 3.8 and then try again :)

Things to note

As at 27/11 we updated the names of the clusters and included an extra level of grouping that explicitly takes into account whether the protein was detected or not

Regulatory clustering model

The general table of how we define regulatory clusters.

Please post questions and issues related to sci-rcm on the Issues <https://github.com/ArianeMora/scircm/issues>_ section of the GitHub repository.

Cite

If you use this please cite our manuscript: https://www.biorxiv.org/content/10.1101/2022.07.02.498058v1

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.5

Nov 28, 2024

1.1.4

Aug 16, 2023

1.1.3

Dec 19, 2022

1.0.5

Oct 23, 2022

1.0.4

Sep 24, 2022

1.0.3

Aug 29, 2022

1.0.1

Jul 1, 2022

1.0.0

Jul 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scircm-1.1.5.tar.gz (47.3 kB view details)

Uploaded Nov 28, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scircm-1.1.5-py3-none-any.whl (74.4 kB view details)

Uploaded Nov 28, 2024 Python 3

File details

Details for the file scircm-1.1.5.tar.gz.

File metadata

Download URL: scircm-1.1.5.tar.gz
Upload date: Nov 28, 2024
Size: 47.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.20

File hashes

Hashes for scircm-1.1.5.tar.gz
Algorithm	Hash digest
SHA256	`858f99bb6f2c2c5d72b09d96e82bc143ad45965493ebe5746fe32f23d657394b`
MD5	`b6bdbb63f07a713c51ef848d355f2f53`
BLAKE2b-256	`4d82a5a80c8e5a8c47b96c02589b6d1cc1b2327c56fdc3662962f74c5aee1aea`

See more details on using hashes here.

File details

Details for the file scircm-1.1.5-py3-none-any.whl.

File metadata

Download URL: scircm-1.1.5-py3-none-any.whl
Upload date: Nov 28, 2024
Size: 74.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.20

File hashes

Hashes for scircm-1.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e025205f2c545f66860ced3d70f5ec4da8550f344faafe5b81b3ea3bcff37dd`
MD5	`32f2fe06f6477e34a069fc7e44c3453b`
BLAKE2b-256	`bd78b5f66d5a25de0a90a817e5954b276902b525ec82188a4859bb60a5be4b15`

See more details on using hashes here.

scircm 1.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sci-RegulatoryClusteringModel

Versions:

Install

Run

Issues

Quick version

Making your CpGs map to a single gene version

Manuscript

Quick guide on the regulation grouping levels

Quick guide on the background filter

Issues

Note on Mac M1

Note on Windows

Note on libraries/dependenices

Things to note

Regulatory clustering model

Cite

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes