Skip to main content

Accurate and fast cell marker gene identification with COSG

Project description

Stars PyPI Docs Total downloads Monthly downloads

Accurate and fast cell marker gene identification with COSG

Overview

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

  • COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq, and spatially resolved transcriptome data.

  • Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.

  • COSG is ultrafast for large-scale datasets and is capable of identifying marker genes for one million cells in less than two minutes.

The method and benchmarking results are described in Dai et al. (2022).

Additionally, the R version of COSG is available here.

Note: we have recently released our python toolkit, PIASO, in which some methods were built upon COSG, please try out PIASO, thank you!

Documentation

COSG documentation.

Release notes

Release v1.0.2 (March 5, 2025)

  • Added plotMarkerDotplot and plotMarkerDendrogram for enhanced marker gene visualization.

  • Introduced support for batch_key to compute cosine similarities separately across different batches.

  • Enabled calculation of normalized COSG scores for comparing gene expression specificity across cell types or datasets.

  • Resolved a SciPy version deprecation issue related to .A attribute usage.

  • Fixed a DataFrame manipulation warning.

  • Added verbosity control, allowing users to adjust log output levels.

Release v1.0.1 (June 15, 2021)

  • First release in PyPI.

Installation

Stable version:

pip install cosg

Development version:

pip install git+https://github.com/genecell/COSG.git

Example

Run COSG:

import cosg
n_gene=30
groupby='CellTypes'
cosg.cosg(
   adata,
   key_added='cosg',
   # use_raw=False, layer='log1p', ## e.g., if you want to use the log1p layer in adata
   mu=100,
   expressed_pct=0.1,
   remove_lowly_expressed=True,
   n_genes_user=100,
   groupby=groupby
)

Draw the dot plot:

sc.tl.dendrogram(adata, groupby=groupby, use_rep='X_pca') ## Change use_rep to the cell embeddings key you'd like to use
df_tmp=pd.DataFrame(adata.uns['cosg']['names'][:3,]).T
df_tmp=df_tmp.reindex(adata.uns['dendrogram_'+groupby]['categories_ordered'])
marker_genes_list={idx: list(row.values) for idx, row in df_tmp.iterrows()}
marker_genes_list = {k: v for k, v in marker_genes_list.items() if not any(isinstance(x, float) for x in v)}

sc.pl.dotplot(
   adata,
   marker_genes_list,
   groupby=groupby,
   dendrogram=True,
   swap_axes=False,
   standard_scale='var',
   cmap='Spectral_r'
 )

Output the marker list as pandas dataframe:

marker_gene=pd.DataFrame(adata.uns['cosg']['names'])
marker_gene.head()

You could also check the COSG scores:

marker_gene_scores=pd.DataFrame(adata.uns['cosg']['scores'])
marker_gene_scores.head()

Question

For questions about the code and tutorial, please contact Min Dai, dai@broadinstitute.org.

Citation

If COSG is useful for your research, please consider citing Dai et al. (2022).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosg-1.0.3.tar.gz (26.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cosg-1.0.3-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file cosg-1.0.3.tar.gz.

File metadata

  • Download URL: cosg-1.0.3.tar.gz
  • Upload date:
  • Size: 26.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for cosg-1.0.3.tar.gz
Algorithm Hash digest
SHA256 b64ab49318d7c0cd6a7f9f915e9b25f478aa32c3728a94aa8fa37e8aa0afb4e9
MD5 648e1c665675481fd931b1332a9a767f
BLAKE2b-256 4e7624186a4011d503e6f48dfa9d656dd51f7f97cbef26ae35ad27f4495519d2

See more details on using hashes here.

File details

Details for the file cosg-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: cosg-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for cosg-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0fc75639f23522ecabc9ba3e5a7b91b5bee7c776d2a883fcccf7238df7f06f2f
MD5 393e09c997988f9a1c07be986f535bf4
BLAKE2b-256 d262217eca57a51553aa927c57a22912e2a44a16cd29e50aa228b5786dee2e50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page