Panpipes - multimodal single cell pipelines
Project description
Panpipes - multimodal single cell pipelines
Created and Maintained by Charlotte Rich-Griffin and Fabiola Curion
Additional contributors: Devika Agarwal and Tom Thomas
See our preprint:
Panpipes: a pipeline for multiomic single-cell data analysis
Charlotte Rich-Griffin, Fabiola Curion, Tom Thomas, Devika Agarwal, Fabian J. Theis, Calliope A. Dendrou.
bioRxiv 2023.03.11.532085;
doi: https://doi.org/10.1101/2023.03.11.532085
Introduction
These pipelines use cgat-core pipeline software
Available pipelines:
- "qc_mm" : for the ingestion of data and computation of QC metrics'
- "preprocess" : for filtering and normalising of each modality
- "integration" : integrate and batch correction using single and multimodal methods
- "clustering" : cell clustering on single modalities
- "refmap" : transfer scvi-tools models from published data to your data
- "vis" : visualise metrics from other pipelines in context of experiment metadata
Installation and configuration
See installation instrcutions here
Oxford BMRC Rescomp users find additional advice in docs/installation_rescomp
General principles for running pipelines
Run the pipeline from the login node on your server, it will use in built the job submission system to submit jobs.
Navigate to the directory where you want to run your analysis (this should not be within the panpipes folder, or your virutal environment folder)
mkdir data_dir/
cd data_dir/
panpipes qc_mm config
This will produce two files, pipeline.log
and pipeline.yml
Edit pipeline.yml
as appropriate for your data, following the instructions within the yml file.
Then check which jobs will run with the command
panpipes qc_mm show full
The output of this will show a list of tasks that will be run as part of the pipeline.
To run use the command
panpipes qc_mm make full
Occasionally you might want to run tasks individually (e.g. to assess outputs before deciding the parameters for the next step)
In order to do this you can run any task in the show full
list such as:
panpipes qc_mm make plot_tenx_metrics
Running the complete pipeline
Run each of pipeline qc, integration and clustering in separate folders.
QC
- Generate sample submission file
- Generate qc genelists
- For adt assay - generate the protein metadata file example. This file is integrated into the mdata['prot'].var slot.
- Generate config file (
panpipes qc_mm config
) - Edit the pipeline.yml file for your dataset
- this is explained step by step within the pipeline.yml file
- Run complete qc pipeline with
panpipes qc_mm make full
- Use outputs to decide filtering thresholds.
- Note that the actual filtering occurs in the first step of Preprocess pipeline
- TODO: create doc to explain the pipeline outputs
The h5mu file outputted from qc_mm
contains concatenated raw counts from all samples in the submission file, plus qc metrics are computed, and these qc metrics are visualised in a variety of plots to aid the user to determine data quality and filtering thresholds.
Preprocess
- In a new folder, generate config file for integration,
panpipes preprocess config
- edit the pipeline.yml file
- The filtering options are dynamic depending on your qc_mm inputs more details here
- There are lots of options for normalisation explained in the pipeline.yml
- Run complete preprocess pipeline with
panpipes preprocess make full
The h5mu outputted from preprocess
is filtered and normalised, and for rna highly variable genes are computed.
Integration
- In a new folder, generate config file for integration,
panpipes integration config
and edit the pipeline.yml file. - Run
panpipes integration make plot_pcas
and assess the post filtering qc plots, and pca outputs - Run batch correction with
panpipes integration make batch_correction
(or run steps 2 and 3 in one go withpanpipes integration make full
) - Use pipeline outputs to decide on the best batch correction method
- Edit the integration pipeline yml with your preferred batch correction
- Run
panpipes integration make merge_batch_correction
Refmap
- In a new folder, generate config file for integration,
panpipes refmap config
and edit the pipeline.yml file. - Run complete refmap pipeline with
panpipes refmap make full
Clustering
- In a new folder, generate config file for integration,
panpipes clustering config
and edit the pipeline.yml file. - Run the clustering pipeline
panpipes clustering make cluster_analysis
. This will do the initial nearest neighbours and clustering for the parameters you specify. - Decide on the best values for k nearest neighbours based on UMAPs and clustree results. Once decided delete the folders for the parameters you don't need and delete those from the pipeline.yml.
- Find markers for each of your cluster resolutions with
panpipes clustering make marker_analysis
(Again you could run all the clustering pipeline at once withpanpipes clustering make full
but by making decisions along the way you'll reduce the computation and file size burden of the pipeline)
Vis
- In a new folder, generate config file for integration,
panpipes vis config
and edit the pipeline.yml file. - Prepare plotting gene list files
- Run complete refmap pipeline with
panpipes vis make full
To repeat the pipeline after editing the pipeline.yml, delete the files in log and repeat step 3.
Running pipeline modules from different entry points.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file panpipes-0.2.0a0.tar.gz
.
File metadata
- Download URL: panpipes-0.2.0a0.tar.gz
- Upload date:
- Size: 146.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e2d5509820623da049c94f5ded4e7b36eb216c8df4c9d79f4dcdaf1aba00de5 |
|
MD5 | cfed963e46a22368f899d5f9dd6bb6ef |
|
BLAKE2b-256 | 53302d2167e34676d372d6fa10f572bd458a3cc7849e88cc61f1cc3d5dffae98 |
Provenance
File details
Details for the file panpipes-0.2.0a0-py3-none-any.whl
.
File metadata
- Download URL: panpipes-0.2.0a0-py3-none-any.whl
- Upload date:
- Size: 196.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abf20494a4307e73de52ea6bee97b9258c52dd48b8e1b5abb2b9fd6f92baec05 |
|
MD5 | 7b86b6fbe6e07e0586e8d0551c7e7406 |
|
BLAKE2b-256 | a5671ced5d9676705f47f1c98f90c81715105777fcf8626ea7deaf170666857c |