A useful module for alligning cell lines to tumors
Project description
Celligner
Celligner is a computational project to align multiple cancer datasets across sequencing modalities, tissue conditions (media, perturbations..) and format (CL/tumor/organoids/spheroids)
See our latest paper on aligning CCLE cell lines with TCGA tumors: 2020 paper
Remark
Celligner is initially an R project that you can find in the R/
folder.
A Python version was made that performs the exact same computations as the R version. However one should not expect the exact same plot for a couple reasons:
UMAP
The plot some users have been used to is a unique run of UMAP on the Celligner realignment data. This is done by fixing the seed of the UMAP algorithm. You can still do that for the python version but it is disabled by default and not recommended. We recommend users to play with the UMAP parameter and make multiple plots. This helps to prevent reading too much into UMAP's output. Things that don't stay the same are not necessarily true attributes of the data.
Learn more here: distill, Lior's twittorial.
Additionally we also advice users to complement assumptions by applying methods like differential expression analysis across clusters to find any meaningful information.
Algorithmic differences
Celligner is composed of 4 key steps:
- A Louvain clustering: this version is the ScanPy implementation of this method while Celligner is using Seurat's. There might be some slight implementation differences.
- A limma diff expression analysis to find key variance genes across clusters for each dataset: this version is 100% similar to the R version of Celligner.
- A cPCA to remove tumor impurity signal. This method is exactly the same except that the python version does exact PCA computation while the R version does an approximate version.
- An MNN allignment: this version is 100% similar to the R version of Celligner in its output.
Is there any other differences?
Overall improvements, yes:
- A “pre-fitted” model is available to download here:
gs://celligner/model.pkl
(on request for now) - Using your own dataset and adding new dataset is super simple now with
fit()
,transform()
syntax - You don’t need to rerun the entire model when adding new (adding 600 new samples take only 5mns to run)
- The model takes much less memory to run and can run on any machine now (you don’t need 64Gb of RAM anymore), and it also takes less than an hour to fully run (on a good machine).
- There is now an interactive plot using Bokeh to better visualise your samples of interest.
- You can now easily choose parameters and even choose between 2 different versions of MNN.
Just want a quick look?
Go here for the production version: https://depmap.org/portal/celligner/
Go here for some usage examples: https://raw.githack.com/broadinstitute/celligner/master/docs/example.html
Install
TO see the old R package installation instruction, see the
R/
folder.
Before running pip, make sure that you have R installed.
pip install celligner
Even with R, some platform might not have all the required packages already installed (thanks R for being so easy to work with!)
In that case, please refer to our docker image:s
A dockerized version is available at jkobject:pycelligner
to install the latest unstaged version of Celligner in dev mode, do:
git clone https://github.com/broadinstitute/celligner.git
cd celligner
pip install -e .
For developers
see CONTRIBUTING.md
Use Celligner
See docs/Celligner_demo.[html|pdf]
for an example of usage.
(view here)
celligner works like most scikit learn tool.
A user fits a dataset (e.g. CCLE tpm expression),
from celligner import Celligner
my_alligner = Celligner(make_plots=True)
my_alligner.fit(CCLE_expression, CCLE_annotation)
and then transforms another one based on this fitted dataset
my_alligner.method = "mnn_marioni"
my_alligner.mnn_kwargs = {'k1': 5, 'k2': 50, 'cosine_norm': True, "fk":5}
transformed_TCGA = my_alligner.transform(TCGA_expression, TCGA_annotation)
my_alligner.plot(color_column="tissue_type", colortable=TISSUE_COLOR, umap_kwargs={'n_neighbors': 15,'min_dist': 0.2, 'metric': 'cosine'})
Users can access other methods such as save(), load(), addToFit(), etc, as well as many data in values: pca_transform, transform_clusters, differential_genes_names, mnn_pairs, etc.
Please have a look at docs/Celligner_demo.[html|pdf]
for an example of usage.
(view here)
Computational complexity
Depending on the dataset, Celligner can be quite memory hungry. for TCGA, expect at least 50-60Gb of memory being used. You might need a powerfull computer, lots of swap and to increase R's default maximum allowed memory.
You can also use the low_memory=True
option to reduce the memory used by celligner in the memory intensive PCA
& cPCA
methods.
Add your own data to a pretrained model
If you want to see your dataset in celligner, you can decide to use our own prefitted version.
! curl https://storage.googleapis.com/celligner/model.pkl -output temp/model.pkl
from celligner import Celligner
my_alligner = Celligner()
my_alligner.load('temp/model.pkl')
We fit the model with CCLE and then transform TCGA. But you can decide differently.
For example: If you want to see how some of your newly sequenced tumors mapped to the CCLE (and TCGA) dataset, just load the model as displayed above and then run:
my_alligner.addTotransform(your_tpm, your_annotations)
my_alligner.plot()
This way you will not rerun the entire model.
See docs/Celligner_demo.[html|pdf]
for other examples of usage.
(view here)
Multidataset alignment
Dee docs/Celligner_demo.[html|pdf]
for an example of usage.
(view here)
One can use addToFit(), addToPredict() depending on whether they want to align their dataset to another or align another dataset to theirs.
If you have a very small dataset and want to align to CCLE or CGA, use the parameter doAdd=True
in order to not rerun the entire pipeline and use cached information.
R Celligner
For the original R version of celligner, please check the R/README.md file here: https://github.com/broadinstitute.org/celligner/tree/master/R/README.md
Please use github issues for any problem related to the tool.
Initial Project:
Allie Warren @awarren
Maintainer:
Jérémie Kalfon @jkobject
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file celligner-1.1.0.tar.gz
.
File metadata
- Download URL: celligner-1.1.0.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abf39f0b365244e37c5113b8225e47546e3b62cc9516d64daa54d527f39f7eff |
|
MD5 | d862a7766258db7a9e406144e3317e64 |
|
BLAKE2b-256 | d43599da3069c8b0900f2ff1d64de6db517196f97f8250ac2e5be0fbcfd2b296 |
File details
Details for the file celligner-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: celligner-1.1.0-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f0899595ea4018d67104a7dd5792b01d0e79e5c77917c83da69a2b5e7954b22 |
|
MD5 | d3aa5f5dd93e817e856be89f9388ec15 |
|
BLAKE2b-256 | 9cb5deaf9220bbe31366c380fb8cf023badba4bec47329c83c4571370690b4fc |