Skip to main content

Panpipes - multimodal single cell pipelines

Project description

Panpipes - multimodal single cell pipelines

Created and Maintained by Charlotte Rich-Griffin and Fabiola Curion
Additional contributors: Devika Agarwal and Tom Thomas

See our preprint:
Panpipes: a pipeline for multiomic single-cell data analysis
Charlotte Rich-Griffin, Fabiola Curion, Tom Thomas, Devika Agarwal, Fabian J. Theis, Calliope A. Dendrou.
bioRxiv 2023.03.11.532085;
doi: https://doi.org/10.1101/2023.03.11.532085

Introduction

These pipelines use cgat-core pipeline software

Available pipelines:

  1. "qc_mm" : for the ingestion of data and computation of QC metrics'
  2. "preprocess" : for filtering and normalising of each modality
  3. "integration" : integrate and batch correction using single and multimodal methods
  4. "clustering" : cell clustering on single modalities
  5. "refmap" : transfer scvi-tools models from published data to your data
  6. "vis" : visualise metrics from other pipelines in context of experiment metadata

Installation and configuration

See installation instrcutions here

Oxford BMRC Rescomp users find additional advice in docs/installation_rescomp

General principles for running pipelines

Run the pipeline from the login node on your server, it will use in built the job submission system to submit jobs.

Navigate to the directory where you want to run your analysis (this should not be within the panpipes folder, or your virutal environment folder)

mkdir data_dir/
cd data_dir/
panpipes qc_mm config

This will produce two files, pipeline.log and pipeline.yml

Edit pipeline.yml as appropriate for your data, following the instructions within the yml file.

Then check which jobs will run with the command

panpipes qc_mm show full

The output of this will show a list of tasks that will be run as part of the pipeline.

To run use the command

panpipes qc_mm make full

Occasionally you might want to run tasks individually (e.g. to assess outputs before deciding the parameters for the next step) In order to do this you can run any task in the show full list such as:

panpipes qc_mm make plot_tenx_metrics

Running the complete pipeline

Run each of pipeline qc, integration and clustering in separate folders.

QC

  1. Generate sample submission file
  2. Generate qc genelists
  3. For adt assay - generate the protein metadata file example. This file is integrated into the mdata['prot'].var slot.
  4. Generate config file (panpipes qc_mm config)
  5. Edit the pipeline.yml file for your dataset
    • this is explained step by step within the pipeline.yml file
  6. Run complete qc pipeline with panpipes qc_mm make full
  7. Use outputs to decide filtering thresholds.
    • Note that the actual filtering occurs in the first step of Preprocess pipeline
    • TODO: create doc to explain the pipeline outputs

The h5mu file outputted from qc_mm contains concatenated raw counts from all samples in the submission file, plus qc metrics are computed, and these qc metrics are visualised in a variety of plots to aid the user to determine data quality and filtering thresholds.

Preprocess

  1. In a new folder, generate config file for integration, panpipes preprocess config
  2. edit the pipeline.yml file
    • The filtering options are dynamic depending on your qc_mm inputs more details here
    • There are lots of options for normalisation explained in the pipeline.yml
  3. Run complete preprocess pipeline with panpipes preprocess make full

The h5mu outputted from preprocess is filtered and normalised, and for rna highly variable genes are computed.

Integration

  1. In a new folder, generate config file for integration, panpipes integration config and edit the pipeline.yml file.
  2. Run panpipes integration make plot_pcas and assess the post filtering qc plots, and pca outputs
  3. Run batch correction with panpipes integration make batch_correction (or run steps 2 and 3 in one go with panpipes integration make full)
  4. Use pipeline outputs to decide on the best batch correction method
  5. Edit the integration pipeline yml with your preferred batch correction
  6. Run panpipes integration make merge_batch_correction

Refmap

  1. In a new folder, generate config file for integration, panpipes refmap config and edit the pipeline.yml file.
  2. Run complete refmap pipeline with panpipes refmap make full

Clustering

  1. In a new folder, generate config file for integration, panpipes clustering config and edit the pipeline.yml file.
  2. Run the clustering pipeline panpipes clustering make cluster_analysis. This will do the initial nearest neighbours and clustering for the parameters you specify.
  3. Decide on the best values for k nearest neighbours based on UMAPs and clustree results. Once decided delete the folders for the parameters you don't need and delete those from the pipeline.yml.
  4. Find markers for each of your cluster resolutions with panpipes clustering make marker_analysis (Again you could run all the clustering pipeline at once with panpipes clustering make full but by making decisions along the way you'll reduce the computation and file size burden of the pipeline)

Vis

  1. In a new folder, generate config file for integration, panpipes vis config and edit the pipeline.yml file.
  2. Prepare plotting gene list files
  3. Run complete refmap pipeline with panpipes vis make full

To repeat the pipeline after editing the pipeline.yml, delete the files in log and repeat step 3.

Running pipeline modules from different entry points.

see details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panpipes-0.2.0a0.tar.gz (146.0 kB view details)

Uploaded Source

Built Distribution

panpipes-0.2.0a0-py3-none-any.whl (196.1 kB view details)

Uploaded Python 3

File details

Details for the file panpipes-0.2.0a0.tar.gz.

File metadata

  • Download URL: panpipes-0.2.0a0.tar.gz
  • Upload date:
  • Size: 146.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for panpipes-0.2.0a0.tar.gz
Algorithm Hash digest
SHA256 0e2d5509820623da049c94f5ded4e7b36eb216c8df4c9d79f4dcdaf1aba00de5
MD5 cfed963e46a22368f899d5f9dd6bb6ef
BLAKE2b-256 53302d2167e34676d372d6fa10f572bd458a3cc7849e88cc61f1cc3d5dffae98

See more details on using hashes here.

Provenance

File details

Details for the file panpipes-0.2.0a0-py3-none-any.whl.

File metadata

  • Download URL: panpipes-0.2.0a0-py3-none-any.whl
  • Upload date:
  • Size: 196.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for panpipes-0.2.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 abf20494a4307e73de52ea6bee97b9258c52dd48b8e1b5abb2b9fd6f92baec05
MD5 7b86b6fbe6e07e0586e8d0551c7e7406
BLAKE2b-256 a5671ced5d9676705f47f1c98f90c81715105777fcf8626ea7deaf170666857c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page