A metric for the evaluation of single-cell query-to-reference mappings

These details have been verified by PyPI

Project links

Owner

Theis Lab

GitHub Statistics

Maintainers

LisaSikkema

These details have not been verified by PyPI

Project links

Documentation

Project description

mapQC

A metric for the evaluation of single-cell query-to-reference mappings

Getting started

Please refer to the documentation, in particular, the API documentation. A few notes on how and when to use MapQC.

What does mapQC do?

MapQC evaluates the quality of an exisiting query-to-reference mapping, by quantifying the distance between query and reference samples. Rather than using a standard metric to calculate this distance, it compares the expected inter-sample distance (based on controls in the references) to the observed distance between query samples and reference samples, and outputs a normalized distance called the mapQC score for every query cell. MapQC calculates inter-sample distances in a local, per-neighborhood manner. (For more methodological details, check out the preprint.)
The reference is expected to cover most of the diversity existing in the control population (e.g. young and old, low and high BMI, smokers and non-smokers, different ethnicities, etc. for human data), such that the controls in the query are expected to look similar to some of the samples in the reference. Therefore, mapQC works best if the reference is a large-scale reference including data from many individuals. Moreover, the query needs to includes control samples, such that we know for a subset of query samples how well they should integrate with the reference.
MapQC scores can be regarded as a Z-scored distance to the reference score (Z-scored based on the inter-sample distances in the reference itself), such that a mapQC score of 2 for a given query cell represents a distance to the reference of two standard deviations above the expected distance (based on the reference). Therefore, mapQC scores > 2 are considered high, and indicate either remaining batch effects (if seen in control samples) or disease-specific cell states (if seen in case but not in control samples).

What are the data requirements for using mapQC?

In short, you need one AnnData object, including:

A large scale reference, including only its healthy/control cells.
A mapped query dataset, with healthy/control cells (must-have) and case/perturbed cells (if you have them).
Metadata (see below)
A mapping-derived embedding of both the reference and the query

Below, the exact requirements are outlined in more detail.

Reference: MapQC is meant to evaluate the mapping of a given dataset to an existing, large-scale reference. It assumes the reference more or less covers the diversity of the control population (e.g. diversity among healthy individuals for the case of human data, or of unperturbed organoids generated with a wide array of protocols for an organoid dataset). Therefore, a mapping of a single dataset to another single dataset is likely to not fulfill these assumptions, and mapQC is not guaranteed to work well. MapQC runs on a scanpy AnnData object, that includes the control cells from the reference (i.e. no perturbed or diseased cells!) and no perturbed/diseased/etc. cells in the reference. Make sure to exclude these before running mapQC.
Query: The query (the dataset mapped to the reference) is expected to have both control and case samples. MapQC can also be run without case samples in the query, but it should always include controls. The query cells should be in the same AnnData object as the reference.
Required Metadata: Several metadata columns need to be present in your adata.obs:

The following need a value for every cell from both the reference and the query. Column names can be set as wanted:
- A "study" key, specifying from which study/dataset each cell in the reference and query came. The query is assumed to come from a single study. If the query includes multiple studies, map these separately and run mapQC on each of them separately.
- A "sample" key, specifying from which biological sample a given cell came.
- A reference versus query key, specifying for each cell whether it is from the reference or the query.
And optionally:
- A grouping of all your cells, e.g. a clustering run on your mapping embedding. If this is provided, mapQC will sample cells proportional to those groups instead of taking randomly sampled cells to choose its neighborhood sample cells. Providing a grouping might help better covering the full embedding space (especially helpful for rare cell types) when running mapQC.
And for the query:
- A "condition" key, specifying for the query what condition (case/control etc.) each cell belongs to, e.g. the disease of the patient from which the sample came or if it was a control.
Embedding Data: Your adata object needs to include the mapped embedding, including coordinates for both the reference and the query. These can be stored either in adata.X or in adata.obsm.

Installation

You need to have Python 3.10 or newer installed on your system.

There are several alternative options to install mapqc:

Install the latest development version:

pip install git+https://github.com/theislab/mapqc.git@main

Release notes

See the changelog.

Contact

For questions and help requests, submit an issue on the mapQC GitHub page.

Citation

t.b.a

Project details

These details have been verified by PyPI

Project links

Owner

Theis Lab

GitHub Statistics

Maintainers

LisaSikkema

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.1.1

Jun 17, 2025

0.1.0

May 26, 2025

This version

0.0.1

May 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapqc-0.0.1.tar.gz (14.8 MB view details)

Uploaded May 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mapqc-0.0.1-py3-none-any.whl (40.4 kB view details)

Uploaded May 19, 2025 Python 3

File details

Details for the file mapqc-0.0.1.tar.gz.

File metadata

Download URL: mapqc-0.0.1.tar.gz
Upload date: May 19, 2025
Size: 14.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mapqc-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`294d4664808c822121cd6ff2b7f6a64b7abde588af5aa8768767909c54a2a99f`
MD5	`596cdc97733e0e6f4fccb37603f8ba27`
BLAKE2b-256	`36728f0987d4552b8dc5cf44e2acd38a66d3b9033e088d5e316c2b39907cb7c6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mapqc-0.0.1.tar.gz:

Publisher: release.yaml on theislab/mapqc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mapqc-0.0.1.tar.gz
- Subject digest: 294d4664808c822121cd6ff2b7f6a64b7abde588af5aa8768767909c54a2a99f
- Sigstore transparency entry: 215042449
- Sigstore integration time: May 19, 2025
Source repository:
- Permalink: theislab/mapqc@ec01bf30296098490381891526979ff20cfb84e8
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/theislab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@ec01bf30296098490381891526979ff20cfb84e8
- Trigger Event: release

File details

Details for the file mapqc-0.0.1-py3-none-any.whl.

File metadata

Download URL: mapqc-0.0.1-py3-none-any.whl
Upload date: May 19, 2025
Size: 40.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mapqc-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3372259e478f2d96fe0d078db371435c008ced549b0c08b04ff466e5893027a`
MD5	`a16bcc7f78d4ea63ca04c20a80d9f179`
BLAKE2b-256	`ed1758a4e82ec4b3e60ce0218b991c8c3d91ab4e885b4299fe9fa460c8e10d03`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mapqc-0.0.1-py3-none-any.whl:

Publisher: release.yaml on theislab/mapqc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mapqc-0.0.1-py3-none-any.whl
- Subject digest: d3372259e478f2d96fe0d078db371435c008ced549b0c08b04ff466e5893027a
- Sigstore transparency entry: 215042451
- Sigstore integration time: May 19, 2025
Source repository:
- Permalink: theislab/mapqc@ec01bf30296098490381891526979ff20cfb84e8
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/theislab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@ec01bf30296098490381891526979ff20cfb84e8
- Trigger Event: release

mapqc 0.0.1

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mapQC

Getting started

What does mapQC do?

What are the data requirements for using mapQC?

Installation

Release notes

Contact

Citation

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance