Skip to main content

Reproducible boilerplate-free workflow management for Scanpy-based scRNA-seq analysis.

Project description

Python 3.11 Python 3.12 Python 3.13 License: AGPL v3 Code style: ruff codecov Python package

MORESCA (MOdular and REproducible Single-Cell Analysis)

This repository provides a template on standardized scRNA-seq analysis using Python and the Scanpy library. All parameters of the workflow are controlled with single config file.

Usage

Installation

We strongly recommend to install MORESCA into a virtual environment. Here, we use Conda:

conda create -n <env_name> python=3.12
conda activate <env_name>

Then, simply install MORESCA with pip:

pip install moresca

[!IMPORTANT] If you want to use Python 3.13 on MacOS, make sure to use GCC>=16. This is required for compiling scikit-misc. See this discussion for advice.

Calling the template

Flag Type Description Default
-d, --data Path Path to the h5ad file data/adata_raw.h5ad
-p, --parameters Path Path to the config file config.gin
-v, --verbose Boolean If set, prints to output False
-f, --figures Boolean If set, figures will be generated False

By default, template.py expects the data in H5AD format to be in data. The two folders figures and results are generated on the fly if they don't exist yet.

Currently, the script will perform the most common operations from doublet removal to DEG analysis of found clusters. If you want to apply ambient RNA correction beforehand, you need to run this separately.

The following example executes the template with the h5ad file example_data.h5ad, the parameter file config.gin and enables both print-statements and figures.

python template.py -d example_data.h5ad -p config.gin -v -f

Using the config.gin

By default, the used parameter file looks like this:

# config.gin
quality_control:
    apply = True
    doublet_removal = True
    outlier_removal = True
    min_genes = 200
    min_counts = None
    max_counts = None
    min_cells = 10
    n_genes_by_counts = None
    mt_threshold = 15
    rb_threshold = 10
    hb_threshold = 1
    figures = "figures/"
    pre_qc_plots = True
    post_qc_plots = True

normalization:
    apply = True
    method = "log1pPF"
    remove_mt = False
    remove_rb = False
    remove_hb = False

feature_selection:
    apply = True
    method = "seurat_v3"
    number_features = 2000

scaling:
    apply = True
    max_value = None

pca:
    apply = True
    n_comps = 100
    use_highly_variable = True

batch_effect_correction:
    apply = False
    method = "harmony"
    batch_key = None

neighborhood_graph:
    apply = True
    n_neighbors = 30
    n_pcs = None
    metric = "cosine"

clustering:
    apply = True
    method = "leiden"
    resolution = 1.0

diff_gene_exp:
    apply = True
    method = "wilcoxon"
    groupby = "leiden_r1.0"
    use_raw = False
    layer = "unscaled"
    tables = None

umap:
    apply = True

plotting:
    apply = True
    umap = True
    path = "figures/"

The following values of the parameters are currently possible

Parameter Values
quality_control
apply bool
doublet_removal bool
doublet_removal bool
min_genes int, null
min_cells int, null
mt_threshold float, null
rb_threshold float, null
hb_threshold float, null
figures str
pre_qc_plots bool
post_qc_plots bool
normalization
method log1pCP10k, log1PF, PFlog1pPF, pearson_residuals, null
remove_mt bool, null
remove_rb bool, null
remove_hb bool, null
remove_custom_genes Not implemented
feature_selection
apply bool
method seurat, seurat_v3, pearson_residuals, anti_correlation, null
number_features int, null
scaling
apply bool
max_value int, float
pca
apply bool
n_comps int, float
use_highly_variable bool
batch_effect_correction
apply bool
method harmony, null
batch_key Not implemented / null
neighborhood_graph
apply bool
n_neighbors int
n_pcs int, null
metric str
clustering
apply bool
method str, null
resolution float
diff_gene_exp
apply bool
method wilcoxon, logreg, t-test, t-test_overestim_var
groupby str
use_raw bool
layer str, null
tables bool

Contributing

For contribution purposes, you should install MORESCA in dev mode:

pip install -e ".[dev]"

This additionally installs ruff and pytest, which we use for formatting and code style control. Please run these before you commit new code. Note: This will be made mandatory by using pre-commit hooks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moresca-0.1.1.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moresca-0.1.1-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file moresca-0.1.1.tar.gz.

File metadata

  • Download URL: moresca-0.1.1.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for moresca-0.1.1.tar.gz
Algorithm Hash digest
SHA256 382fb3cd70e9c595eb3372c2174579035d0ab28df05c8993175dfa17a5e81677
MD5 e464d6347d23f9e47368a3bbfccfbc9d
BLAKE2b-256 4ebb701b7208941055f333497e6b1f9fe14876f1257785f19cc4d6b83025b82c

See more details on using hashes here.

File details

Details for the file moresca-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: moresca-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for moresca-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54ee0d00f7cbd6baba9cf12e2be431682ac39631e75b4e7792dd444a91ef5e6d
MD5 5bf57e68ddf98e51b0ce505099a5aaac
BLAKE2b-256 556512462be7c4f193002ff662189bc3d1286516c512ec5ea0ba031d49437447

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page