Skip to main content

A pipeline for mapping scRNA-seq query cells to reference cell states using Hotspot, meta-modules, and consensus profiles.

Project description

ITHmapper

ITHmapper logo showing stylized interconnected cells forming a network, with vibrant colors suggesting diversity and collaboration, set against a clean white background. The mood is scientific and innovative.

A Python pipeline for mapping scRNA-seq query cells to reference cell states using Hotspot module scoring, meta-module aggregation, consensus state assignment, and cell filtering by clustering quality.


Pipeline Overview

  • Scores query cells for reference Hotspot modules using a multi-seed approach
  • Merges module scores and aggregates them into robust meta-modules
  • Scales and summarizes meta-module scores per cell
  • Builds a neighbor graph and clusters cells, filtering by silhouette quality
  • Maps each cell to a consensus reference state by minimum distance in meta-module space
  • Returns a fully annotated, filtered AnnData object ready for further analysis or visualization

Installation

Install with pip, we strongly recommend to use a new virtual environment:

pip install --upgrade pip setuptools wheel
pip install ITHmapper

Dependencies

All required dependencies are specified in pyproject.toml, to ensure consistent results, versions for all packages are used.


Quick Start

Input format: ITHmapper requires an AnnData object with raw counts in adata.layers["counts"], for large datasets we recommend removing other layers as they will increase the memory used. The adata.var.index must be ensembl gene IDs without version (eg. ENSG00000186827, not ENSG00000186827.1). Please ensure the adata file has all genes and not only HVGs. ITHmapper also requires either an scvi or pca embedding of the cells for the Hotspot scoring. ITHmapper requires a key for the adata.obs column containing the number of transcripts in each cell. For single dataset/batch samples we recommend using PCA while for samples from many datasets we recommend scVI. Ignore the "adata.X seems to be already log-transformed." warning if the input adata was already transformed, ITHmapper is still using the raw counts and re-transforming them, see scanpy issue.

Cancer types Only run ITHmapper on one cancer type at a time. ITHmapper will work with the following cancer_type parameters, currently other cancer types are not supported:

'Bladder', 'Brain', 'Breast', 'Colorectal', 'Gastric',
'Kidney_RCC', 'Liver_HCC', 'Lung_LUAD',
'Neuroblastoma', 'Ovarian_HGSOC',
'Pancreas', 'Prostate'

Minimal usage:

import scanpy as sc
from ITHmapper import map_query_to_reference_cell_states

# Load your pre-filtered AnnData (with scVI or PCA embeddings computed)
# the adata must have a 'counts' layer with raw, unnormalized counts.
#the adata.var.index must be ensembl ID eg. ENSG00000186827
adata = sc.read_h5ad("your_filtered_query_cells.h5ad")

# Map to reference cell states
filtered_labelled_adata = map_query_to_reference_cell_states(
    adata,
    cancer_type="Lung_LUAD",
    embedding_key = 'X_scVI',
    umi_counts_obs_key = "nCount_RNA",
    verbose = True
)

# The mapped consensus state is in:
final_adata.obs['cancer_state'].value_counts()

Pipeline Parameters

adata: AnnData object to be processed. cancer_type: one of the cancer types listed above. embedding_key: embedding for hotspot to use, typically 'X_pca' or 'X_scVI' flag_cells : bool = False, whether to mark cells that have low silhouette scores as unclear, increases the compute time. filter_silhouette: float = 0.2, the minimim silhouette score for confident predictions. umi_counts_obs_key = "nCount_RNA",indicates which column in adata.obs refers to number of transcripts/UMIs.


Citing

If you use this pipeline, please cite the relevant preprint or publication.


License

MIT License (see LICENSE file)


Contact

For questions or contributions, please contact Ido Nofech-Mozes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ithmapper-0.1.6.tar.gz (9.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ithmapper-0.1.6-py3-none-any.whl (10.0 MB view details)

Uploaded Python 3

File details

Details for the file ithmapper-0.1.6.tar.gz.

File metadata

  • Download URL: ithmapper-0.1.6.tar.gz
  • Upload date:
  • Size: 9.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for ithmapper-0.1.6.tar.gz
Algorithm Hash digest
SHA256 4f7e6fa60bd2ccaa35a6cc87eec85ebc4e4aeae20038534b5603ff54535951f7
MD5 11cbb191202aa52ade49d558b498d591
BLAKE2b-256 98e610e02fdd742a1d45f61823f8cab9c3110f206cd20976c599321647c5be5d

See more details on using hashes here.

File details

Details for the file ithmapper-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: ithmapper-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 10.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for ithmapper-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1dbb72dea92e6a5609d4d281824d4d6671fed86c1c6de79608e5d4e505ca4e61
MD5 539c69c21ac071d80cd9d1ae72bc5f0d
BLAKE2b-256 d8bfa96e2649063c9b7ff0b6cfac089979e02c9f35d483aaaad342e3de465676

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page