Skip to main content

No-filtering, attribute-back analysis of high-dimensional omics: keep every feature, read it with a configurable backbone over multiple arrangements, and attribute the result to named features ('gene chords').

Project description

DeepMapper

DOI

No-filtering, attribute-back analysis of high-dimensional omics. DeepMapper keeps every feature (no highly-variable-gene selection, no dimension reduction), reads the full matrix with a configurable backbone over several feature arrangements, and attributes each result back to named features. That surfaces distributed gene chords, sets of genes that separate a cell state only together, which standard pipelines discard. An optional de-novo step recovers transcripts absent from the reference annotation.

Install

pip install pydeepmapper                 # core engine: run(X, y) + linear/mlp/cnn + attribution
pip install "pydeepmapper[io]"           # + 10x / h5ad loaders (scanpy, anndata)
pip install "pydeepmapper[backbones]"    # + ResNet / ViT / timm backbones
pip install "pydeepmapper[denovo]"       # + de-novo ingestion (biopython; external tools below)
pip install "pydeepmapper[all]"          # everything

From a clone:

git clone https://github.com/tansel/deepmapper.git
cd deepmapper
pip install -e ".[dev]"
python -m pytest

Quickstart

import numpy as np
from pydeepmapper.config import DeepMapperConfig, BackboneSpec
from pydeepmapper.runner import run

X = ...            # (n_cells, n_genes) expression matrix, UNFILTERED (no HVG, no PCA)
y = ...            # integer class labels, shape (n_cells,)
genes = [...]      # feature names, len == X.shape[1]

cfg = DeepMapperConfig(n_passes=3, backbone=BackboneSpec(kind="cnn_small"))
findings = run(X, y, cfg, feature_names=genes)

for name, freq, importance in findings.ranking(20):
    print(name, round(freq, 3), round(importance, 4))   # the gene chord, ranked

Swap the backbone with BackboneSpec(kind=...): linear, mlp, cnn_small (default), cnn, resnet18, vit_cct, timm:<name>, or conv_vae. For a fast, exact linear check:

from pydeepmapper import linear_baseline
clf, scaler, ranking = linear_baseline.fit(X, y, feature_names=genes)

How it works

  1. Keep every feature. No HVG, no PCA.
  2. Lay the feature vector out as a small pseudo-image, one pixel per feature, under a seeded permutation.
  3. Train the backbone, attribute per pixel, project back to per-feature importance.
  4. Repeat over N arrangements and accumulate into a ranking plus stability statistics.

Documentation

The full docs site is at https://tansel.github.io/deepmapper/ (built with MkDocs Material; the API reference is generated from the docstrings). In the repo:

  • User manual is the full guide: configuration, data loading, evaluation, attribution, early stopping, de-novo recovery, and the package layout.
  • Reproducibility map maps every figure to its script and dataset.
  • Data sources lists the dataset accessions.
  • bench/ holds the analysis scripts.

Build the docs locally with pip install -e ".[docs]" then mkdocs serve.

Using DeepMapper with Claude

The repo ships a Claude Code skill at .claude/skills/deepmapper/SKILL.md. Open the repo in Claude Code and ask, for example, "analyse this h5ad with DeepMapper and tell me which genes separate the states". Claude loads your data, runs the linear baseline and the pipeline, and reads back the ranked gene chord. See docs/claude-skill.md.

Reproducing the analyses

pip install -e ".[all]"
bash scripts/download_data.sh
python bench/ribosomal_validation.py     # Fig 1, and so on (see docs/REPRODUCIBILITY.md)

Citation

If you use DeepMapper, please cite the peer-reviewed article (see CITATION.cff):

Ersavas T., Smith M.A., Mattick J.S. (2024). Novel applications of Convolutional Neural Networks in the age of Transformers. Scientific Reports 14. https://doi.org/10.1038/s41598-024-60709-z

To cite this software release specifically, use the archived version on Zenodo (the concept DOI always resolves to the latest release): https://doi.org/10.5281/zenodo.20967454

License

Licensed under the Apache License, Version 2.0, see LICENSE and NOTICE. Apache-2.0 is permissive (commercial use, modification, and redistribution allowed) and adds an explicit patent grant covering the method.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydeepmapper-1.0.2.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydeepmapper-1.0.2-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file pydeepmapper-1.0.2.tar.gz.

File metadata

  • Download URL: pydeepmapper-1.0.2.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pydeepmapper-1.0.2.tar.gz
Algorithm Hash digest
SHA256 23350fc4c59363e15dab2579593607acc7d396dabf81c2bb7d960e6a637e2730
MD5 de1c9e168dc3bfe8c64e0e3874c33509
BLAKE2b-256 5fce8100e7d0e03f4a598784d1804bcc7e00fa22e453f2ce4d48ce75c7b7857b

See more details on using hashes here.

File details

Details for the file pydeepmapper-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pydeepmapper-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 39.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pydeepmapper-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 62687f26963e933396a780c897bf32eda87c77965ee2dbddec2b5f0643c367b4
MD5 adabd9525cdba4512bf2ddcb5dd0caf0
BLAKE2b-256 8ad47b984e18896ad8dac98384319295694b44f9ab2bd7eabe36b7b25acd374d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page