Skip to main content

Cluster annotation helper for single-cell RNA-seq data in Python

Project description

🧬 clustermolepy: Fast Enrichment for Annotating Single-Cell Clusters

License Python 3.10 Binder

clustermolepy is a light weight Python package inspired by the original clustermole R package. It's designed to help you annotate cell clusters from single-cell RNA-seq data using powerful gene set enrichment analysis.

🚀 Key Features

  • Enrichr Integration :
    • Direct query of the Enrichr API for gene set enrichment analysis.
    • Multi-threaded get_cell_type_enrichment() for fast cell type enrichment using curated gene set libraries.
  • Scanpy Integration :
    • Designed to work seamlessly with Scanpy AnnData objects.
    • Example workflow uses Scanpy for data loading, clustering, and marker gene identification.
  • Biomart Integration :
    • Easily convert gene symbols across species using the Ensembl Biomart API.

📦 Installation

You can install the stable release from PyPI using pip:

pip install clustermolepy

Once installed, you can import and use clustermolepy in your Python environment.

🕹️ Quick Usage Example

Here's a simplified example of how to use clustermolepy to annotate cell clusters. For a more detailed walkthrough, check out the Jupyter Notebook in the examples directory!

Example: Interpreting a PBMC Cluster

import scanpy as sc
from clustermolepy.enrichr import Enrichr

# Load PBMC data and cluster
adata = sc.datasets.pbmc3k_processed()
sc.tl.leiden(adata, resolution=0.5, n_iterations=2, flavor='igraph')

# Get top marker genes for a cluster
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
markers = sc.get.rank_genes_groups_df(adata, group='1')
top_genes = markers['names'][:25].tolist()

# Run enrichment
enr = Enrichr(gene_list=top_genes)
df = enr.get_cell_type_enrichment()
print(df.head(3))

This will return a table with top matching cell types across multiple reference libraries like CellMarker, PanglaoDB, Azimuth, and more.

(Example Output - first 3 rows of the table):

|    | term name               |     p-value |   odds ratio |   combined score | overlapping genes                                                                                            |   adjusted p-value |   old p-value |   old adjusted p-value | gene_set                 |
|---:|:------------------------|------------:|-------------:|-----------------:|:-------------------------------------------------------------------------------------------------------------|-------------------:|--------------:|-----------------------:|:-------------------------|
|  0 | B Cells Naive           | 9.79493e-24 |      571.108 |          30257.4 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'FCER2', 'TCL1A', 'BANK1', 'LINC00926', 'LY86', 'CD37', 'LTB', 'MS4A1'] |        3.82002e-22 |             0 |                      0 | PanglaoDB_Augmented_2021 |
|  1 | B Cells                 | 9.91341e-23 |      466.235 |          23622.1 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'FCER2', 'BANK1', 'HVCN1', 'LY86', 'CXCR4', 'CD37', 'LTB', 'MS4A1']     |        1.93312e-21 |             0 |                      0 | PanglaoDB_Augmented_2021 |
|  2 | B Cell Liver CL:0000236 | 7.54304e-22 |      622.531 |          30277.6 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'BANK1', 'HVCN1', 'CXCR4', 'CD37', 'LTB', 'MS4A1']                      |        8.80014e-21 |             0 |                      0 | Tabula_Muris             |

Example: Cross-Species Gene Mapping

from clustermolepy.utils import Biomart

bm = Biomart()
result = bm.convert_gene_names(
    genes=["TP53", "CD4", "FOXP3"],
    from_organism="hsapiens",
    to_organism="mmusculus"
)
print(result)

(Example Output)

{
  'TP53': ['Trp53'],
  'CD4': ['Cd4'],
  'FOXP3': ['Foxp3']
}

📚 Documentation

Check out the Example Notebook for more information!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustermolepy-0.3.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clustermolepy-0.3.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file clustermolepy-0.3.0.tar.gz.

File metadata

  • Download URL: clustermolepy-0.3.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for clustermolepy-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d3b6e3a775c56a36a82824b2e91439ddab512998d78e91d250ef1fc1d16282a7
MD5 ee45104e9d2b435be5b23dde6ee7dd8a
BLAKE2b-256 d01602cdf23213843ad21e37406848aa98260746ad1427d530fd083c39214f15

See more details on using hashes here.

File details

Details for the file clustermolepy-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: clustermolepy-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for clustermolepy-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5206d32ea96825aba63c1b64f0d6a3693ed5a3c722b80b5b58939a549d2e69ba
MD5 e132bb422a0d969f363ca26fdf6012c8
BLAKE2b-256 3e163e81f8caf45b6f8bf907134396f0c7458ae285f824af7958b12eb2203485

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page