Cluster annotation helper for single-cell RNA-seq data in Python
Project description
🧬 clustermolepy: Fast Enrichment for Annotating Single-Cell Clusters
clustermolepy is a light weight Python package inspired by the original clustermole R package. It's designed to help you annotate cell clusters from single-cell RNA-seq data using powerful gene set enrichment analysis.
🚀 Key Features
- Enrichr Integration :
- Direct query of the Enrichr API for gene set enrichment analysis.
- Multi-threaded
get_cell_type_enrichment()for fast cell type enrichment using curated gene set libraries.
- Scanpy Integration :
- Designed to work seamlessly with Scanpy
AnnDataobjects. - Example workflow uses Scanpy for data loading, clustering, and marker gene identification.
- Designed to work seamlessly with Scanpy
- Biomart Integration :
- Easily convert gene symbols across species using the Ensembl Biomart API.
📦 Installation
You can install the stable release from PyPI using pip:
pip install clustermolepy
Once installed, you can import and use clustermolepy in your Python environment.
🕹️ Quick Usage Example
Here's a simplified example of how to use clustermolepy to annotate cell clusters. For a more detailed walkthrough, check out the Jupyter Notebook in the examples directory!
Example: Interpreting a PBMC Cluster
import scanpy as sc
from clustermolepy.enrichr import Enrichr
# Load PBMC data and cluster
adata = sc.datasets.pbmc3k_processed()
sc.tl.leiden(adata, resolution=0.5, n_iterations=2, flavor='igraph')
# Get top marker genes for a cluster
sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon')
markers = sc.get.rank_genes_groups_df(adata, group='1')
top_genes = markers['names'][:25].tolist()
# Run enrichment
enr = Enrichr(gene_list=top_genes)
df = enr.get_cell_type_enrichment()
print(df.head(3))
This will return a table with top matching cell types across multiple reference libraries like CellMarker, PanglaoDB, Azimuth, and more.
(Example Output - first 3 rows of the table):
| | term name | p-value | odds ratio | combined score | overlapping genes | adjusted p-value | old p-value | old adjusted p-value | gene_set |
|---:|:------------------------|------------:|-------------:|-----------------:|:-------------------------------------------------------------------------------------------------------------|-------------------:|--------------:|-----------------------:|:-------------------------|
| 0 | B Cells Naive | 9.79493e-24 | 571.108 | 30257.4 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'FCER2', 'TCL1A', 'BANK1', 'LINC00926', 'LY86', 'CD37', 'LTB', 'MS4A1'] | 3.82002e-22 | 0 | 0 | PanglaoDB_Augmented_2021 |
| 1 | B Cells | 9.91341e-23 | 466.235 | 23622.1 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'FCER2', 'BANK1', 'HVCN1', 'LY86', 'CXCR4', 'CD37', 'LTB', 'MS4A1'] | 1.93312e-21 | 0 | 0 | PanglaoDB_Augmented_2021 |
| 2 | B Cell Liver CL:0000236 | 7.54304e-22 | 622.531 | 30277.6 | ['CD79B', 'VPREB3', 'CD74', 'CD79A', 'BANK1', 'HVCN1', 'CXCR4', 'CD37', 'LTB', 'MS4A1'] | 8.80014e-21 | 0 | 0 | Tabula_Muris |
Example: Cross-Species Gene Mapping
from clustermolepy.utils import Biomart
bm = Biomart()
result = bm.convert_gene_names(
genes=["TP53", "CD4", "FOXP3"],
from_organism="hsapiens",
to_organism="mmusculus"
)
print(result)
(Example Output)
{
'TP53': ['Trp53'],
'CD4': ['Cd4'],
'FOXP3': ['Foxp3']
}
📚 Documentation
Check out the Example Notebook for more information!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clustermolepy-0.3.0.tar.gz.
File metadata
- Download URL: clustermolepy-0.3.0.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3b6e3a775c56a36a82824b2e91439ddab512998d78e91d250ef1fc1d16282a7
|
|
| MD5 |
ee45104e9d2b435be5b23dde6ee7dd8a
|
|
| BLAKE2b-256 |
d01602cdf23213843ad21e37406848aa98260746ad1427d530fd083c39214f15
|
File details
Details for the file clustermolepy-0.3.0-py3-none-any.whl.
File metadata
- Download URL: clustermolepy-0.3.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5206d32ea96825aba63c1b64f0d6a3693ed5a3c722b80b5b58939a549d2e69ba
|
|
| MD5 |
e132bb422a0d969f363ca26fdf6012c8
|
|
| BLAKE2b-256 |
3e163e81f8caf45b6f8bf907134396f0c7458ae285f824af7958b12eb2203485
|