Skip to main content

Cluster annotation helper for single-cell RNA-seq data in Python

Project description

🧬 clustermolepy: Fast Enrichment for Annotating Single-Cell Clusters

License Python 3.10 Binder

clustermolepy is a light weight Python package inspired by the original clustermole R package. It's designed to help you annotate cell clusters from single-cell RNA-seq data using powerful gene set enrichment analysis.

🚀 Key Features

  • Enrichr Integration :
    • Direct query of the Enrichr API for gene set enrichment analysis.
    • Multi-threaded get_cell_type_enrichment() for fast cell type enrichment using curated gene set libraries.
  • Scanpy Integration :
    • Designed to work seamlessly with Scanpy AnnData objects.
    • Example workflow uses Scanpy for data loading, clustering, and marker gene identification.
  • Biomart Integration :
    • Easily convert gene symbols across species using the Ensembl Biomart API.

📦 Installation

You can install the stable release from PyPI using pip:

pip install clustermolepy

Once installed, you can import and use clustermolepy in your Python environment.

🕹️ Quick Usage Example

Here's a simplified example of how to use clustermolepy to annotate cell clusters. For a more detailed walkthrough, check out the Jupyter Notebook in the examples directory!

Example: Interpreting a PBMC Cluster

import scanpy as sc
from clustermolepy.enrichr import Enrichr

# Load PBMC data and cluster
adata = sc.datasets.pbmc3k_processed()
sc.tl.leiden(adata, resolution=0.5, n_iterations=2, flavor='igraph')

# Get top marker genes for a cluster
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
markers = sc.get.rank_genes_groups_df(adata, group='1')
top_genes = markers['names'][:25].tolist()

# Run enrichment
enr = Enrichr(gene_list=top_genes)
df = enr.get_cell_type_enrichment()
print(df.head(3))

This will return a table with top matching cell types across multiple reference libraries like CellMarker, PanglaoDB, Azimuth, and more.

(Example Output - first 3 rows of the table):

|    | term name               |     p-value |   odds ratio |   combined score | overlapping genes                                                                                            |   adjusted p-value |   old p-value |   old adjusted p-value | gene_set                 |
|---:|:------------------------|------------:|-------------:|-----------------:|:-------------------------------------------------------------------------------------------------------------|-------------------:|--------------:|-----------------------:|:-------------------------|
|  0 | B Cells Naive           | 9.79493e-24 |      571.108 |          30257.4 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'FCER2', 'TCL1A', 'BANK1', 'LINC00926', 'LY86', 'CD37', 'LTB', 'MS4A1'] |        3.82002e-22 |             0 |                      0 | PanglaoDB_Augmented_2021 |
|  1 | B Cells                 | 9.91341e-23 |      466.235 |          23622.1 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'FCER2', 'BANK1', 'HVCN1', 'LY86', 'CXCR4', 'CD37', 'LTB', 'MS4A1']     |        1.93312e-21 |             0 |                      0 | PanglaoDB_Augmented_2021 |
|  2 | B Cell Liver CL:0000236 | 7.54304e-22 |      622.531 |          30277.6 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'BANK1', 'HVCN1', 'CXCR4', 'CD37', 'LTB', 'MS4A1']                      |        8.80014e-21 |             0 |                      0 | Tabula_Muris             |

Example: Cross-Species Gene Mapping

from clustermolepy.utils import Biomart

bm = Biomart()
result = bm.convert_gene_names(
    genes=["TP53", "CD4", "FOXP3"],
    from_organism="hsapiens",
    to_organism="mmusculus"
)
print(result)

(Example Output)

{
  'TP53': ['Trp53'],
  'CD4': ['Cd4'],
  'FOXP3': ['Foxp3']
}

📚 Documentation

Check out the Example Notebook for more information!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clustermolepy-0.3.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clustermolepy-0.3.1-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file clustermolepy-0.3.1.tar.gz.

File metadata

  • Download URL: clustermolepy-0.3.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for clustermolepy-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9274b0d853f2e9545f39973d7b31d31d581e1aa617696f45a97c8ec8b8791a6a
MD5 fb6ed041ea6c6189c094117ac7a6e8e1
BLAKE2b-256 33917625405748e66df06840be4af0baf2baca0e270ea76f418d7ac36202d706

See more details on using hashes here.

File details

Details for the file clustermolepy-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: clustermolepy-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for clustermolepy-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 84de3ded77d05c4e46589a78e8a9294d988fad0e029ef3cddf64e73a74640630
MD5 dfd9f8acf9f1f0cb57981257df76aed8
BLAKE2b-256 d148eaf7c1e2862dee1742981bc7e70db2aa6c84cd8d3736aebc9efc17e2ee1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page