Skip to main content

Accurate and fast cell marker gene identification with COSG

Project description

Stars PyPI install with bioconda Docs Total downloads Monthly downloads

Accurate and fast cell marker gene identification with COSG

Overview

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

  • COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq, and spatially resolved transcriptome data.

  • Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.

  • COSG is ultrafast for large-scale datasets and is capable of identifying marker genes for one million cells in less than two minutes.

The method and benchmarking results are described in Dai et al. (2022).

Additionally, the R version of COSG is available here.

Note I: we released our Python toolkit, PIASO, in which some methods were built upon COSG.

Note II: we have also recently released PIASOmarkerDB for beta testing.

Note III: COSG is also available for online analysis via Galaxy platform.

Documentation

COSG documentation.

Installation

Stable version (PyPI):

pip install cosg

Stable version (bioconda):

conda install -c conda-forge -c bioconda cosg

Development version:

pip install git+https://github.com/genecell/COSG.git

Release notes

Release v1.0.3 (March 11, 2025)

  • Fixed the incompatibility with multiple index columns of adata.uns['cosg']['COSG'] in adata.write function

  • Enhanced plotMarkerDendrogram function with several new capabilities:

    • Implemented support for customized cell type-gene pairs

    • Added color control for nodes and edges

    • Added cell type filtering functionality

    • Integrated support for curved edges in visualization

Release v1.0.2 (March 5, 2025)

  • Added plotMarkerDotplot and plotMarkerDendrogram for enhanced marker gene visualization.

  • Introduced support for batch_key to compute cosine similarities separately across different batches.

  • Enabled calculation of normalized COSG scores for comparing gene expression specificity across cell types or datasets.

  • Resolved a SciPy version deprecation issue related to .A attribute usage.

  • Fixed a DataFrame manipulation warning.

  • Added verbosity control, allowing users to adjust log output levels.

Release v1.0.1 (June 15, 2021)

  • First release in PyPI.

Example

Run COSG:

import cosg
n_genes=30
groupby='CellTypes'
cosg.cosg(
   adata,
   key_added='cosg',
   # use_raw=False, layer='log1p', ## e.g., if you want to use the log1p layer in adata
   mu=100,
   expressed_pct=0.1,
   remove_lowly_expressed=True,
   n_genes_user=n_genes,
   groupby=groupby
)

Draw the dot plot:

cosg.plotMarkerDotplot(
    adata,
    groupby=groupby,
    top_n_genes=3,
    key_cosg='cosg',
    use_rep='X_pca', ## Change use_rep to the cell embeddings key you'd like to use
    swap_axes=False,
    standard_scale='var',
    cmap='Spectral_r',
    # save='test.pdf'
)

Output the marker list as pandas dataframe:

marker_gene=pd.DataFrame(adata.uns['cosg']['names'])
marker_gene.head()

You could also check the COSG scores:

marker_gene_scores=pd.DataFrame(adata.uns['cosg']['scores'])
marker_gene_scores.head()

Question

For questions about the code and tutorial, please contact Min Dai, dai@broadinstitute.org.

Citation

If COSG is useful for your research, please consider citing Dai et al. (2022).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosg-1.0.4.tar.gz (39.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cosg-1.0.4-py3-none-any.whl (37.2 kB view details)

Uploaded Python 3

File details

Details for the file cosg-1.0.4.tar.gz.

File metadata

  • Download URL: cosg-1.0.4.tar.gz
  • Upload date:
  • Size: 39.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for cosg-1.0.4.tar.gz
Algorithm Hash digest
SHA256 a86ee452d4838191fdedd236713a3949e0fbe2380e946c117478ae5319b691c5
MD5 46713ac99762dcf8991f36d50ace2bdd
BLAKE2b-256 ccd3236b426ccddfd5e53b239e0df7391c9379d82fc19de4bb6659837ce2a20d

See more details on using hashes here.

File details

Details for the file cosg-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: cosg-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 37.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for cosg-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2ce8cb5a094776b5e06cd9f050ef02b88cfaa14ffceae3972670a493d42a0049
MD5 ce2446b168fe9e93d37ccbec348ec23e
BLAKE2b-256 a7cb479f65ea03827979038f1c74e943a0f2c03d5ea4d8b2f02b3375db07de5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page