Python bindings to the singleR algorithm to annotate cell types from known references.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Unit tests

Tinder for single-cell data

Overview

This package provides Python bindings to the C++ implementation of the SingleR algorithm, originally developed by Aran et al. (2019). It is designed to annotate cell types by matching cells to known references based on their expression profiles. So kind of like Tinder, but for cells.

Quick start

Firstly, let's load in the famous PBMC 4k dataset from 10X Genomics:

import singlecellexperiment as sce
data = sce.read_tenx_h5("pbmc4k-tenx.h5")
mat = data.assay("counts")
features = [str(x) for x in data.row_data["name"]]

Now we use the Blueprint/ENCODE reference to annotate each cell in mat:

import singler
results = singler.annotate_single(
    mat,
    features,
    ref_data = "BlueprintEncode",
    ref_features = "symbol",
    ref_labels = "main",
    cache_dir = "_cache"
)

The results data frame contains all of the assignments and the scores for each label:

results.column("best")
## ['Monocytes',
##  'Monocytes',
##  'Monocytes',
##  'CD8+ T-cells',
##  'CD4+ T-cells',
##  'CD8+ T-cells',
##  'Monocytes',
##  'Monocytes',
##  'B-cells',
##  ...
## ]

results.column("scores").column("Macrophages")
## array([0.35935275, 0.40833545, 0.37430726, ..., 0.32135929, 0.29728435,
##        0.40208581])

Calling low-level functions

The annotate_single() function is a convenient wrapper around a number of lower-level functions in singler. Advanced users may prefer to build the reference and run the classification separately. This allows us to re-use the same reference for multiple datasets without repeating the build step.

We start by fetching the reference of interest from GitHub. Note the use of cache_dir to avoid repeated downloads from GitHub.

ref = singler.fetch_github_reference("BlueprintEncode", cache_dir="_cache")

We'll be using the gene symbols here with the markers for the main labels. We need to set restrict_to to the features in our test data, so as to avoid picking marker genes in the reference that won't be present in the test.

ref_features = ref.row_data.column("symbol")

markers = singler.realize_github_markers(
    ref.metadata["main"],
    ref_features,
    restrict_to=set(features),
)

Now we build the reference from the ranked expression values and the associated labels in the reference:

built = singler.build_single_reference(
    ref_data=ref.assay("ranks"),
    ref_labels=ref.col_data.column("main"),
    ref_features=ref_features,
    markers=markers,
)

And finally, we apply the pre-built reference to the test dataset to obtain our label assignments. This can be repeated with different datasets that have the same features or a superset of features.

output = singler.classify_single_reference(
    mat,
    test_features=features,
    ref_prebuilt=built,
)

## output
BiocFrame with 4340 rows and 3 columns
            best                                   scores                delta
        <list>                              <BiocFrame>   <ndarray[float64]>
[0] Monocytes 0.33265560369962943:0.407117403330602...  0.40706830113982534
[1] Monocytes 0.4078771641637374:0.4783396310685646...  0.07000418564184802
[2] Monocytes 0.3517036021728629:0.4076971245524348...  0.30997293412307647
            ...                                      ...                  ...
[4337]  NK cells 0.3472631136865701:0.3937898240670208...  0.09640242155786138
[4338]   B-cells 0.26974632191999887:0.334862058137758... 0.061215905058676856
[4339] Monocytes 0.39390119034537324:0.468867490667427...  0.06678168346812047

Integrating labels across references

We can use annotations from multiple references through the annotate_integrated() function:

import singler
single_results, integrated = singler.annotate_integrated(
    mat,
    features,
    ref_data_list = ("BlueprintEncode", "DatabaseImmuneCellExpression"),
    ref_features_list= "symbol",
    ref_labels_list = "main",
    build_integrated_args = { "ref_names": ("Blueprint", "DICE") },
    cache_dir = "_cache",
    num_threads = 6
)

This annotates the test dataset against each reference individually to obtain the best per-reference label, and then it compares across references to find the best label from all references. Both the single and integrated annotations are reported for diagnostics.

integrated.column("best_label")
## ['Monocytes', 
##  'Monocytes',
##  'Monocytes',
##  'CD8+ T-cells',
##  'CD4+ T-cells',
##  'CD8+ T-cells',
##  'Monocytes',
##  'Monocytes',
##  ...
## ]

integrated.column("best_reference")
## ['Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## ...
##]

Developer notes

Build the shared object file:

python setup.py build_ext --inplace

For quick testing:

pytest

For more complex testing:

python setup.py build_ext --inplace && tox

To rebuild the ctypes bindings with cpptypes:

cpptypes src/singler/lib --py src/singler/_cpphelpers.py --cpp src/singler/lib/bindings.cpp --dll _core

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.2

Jan 4, 2024

0.1.1

Sep 20, 2023

0.1.0

Sep 19, 2023

0.0.2

Sep 11, 2023

0.0.1

Aug 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singler-0.1.2.tar.gz (47.2 kB view hashes)

Uploaded Jan 4, 2024 Source

Built Distributions

singler-0.1.2-cp311-cp311-musllinux_1_1_x86_64.whl (3.3 MB view hashes)

Uploaded Jan 4, 2024 CPython 3.11 musllinux: musl 1.1+ x86-64

singler-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded Jan 4, 2024 CPython 3.11 manylinux: glibc 2.17+ x86-64

singler-0.1.2-cp311-cp311-macosx_11_0_arm64.whl (126.3 kB view hashes)

Uploaded Jan 4, 2024 CPython 3.11 macOS 11.0+ ARM64

singler-0.1.2-cp311-cp311-macosx_10_9_x86_64.whl (143.2 kB view hashes)

Uploaded Jan 4, 2024 CPython 3.11 macOS 10.9+ x86-64

singler-0.1.2-cp310-cp310-musllinux_1_1_x86_64.whl (3.3 MB view hashes)

Uploaded Jan 4, 2024 CPython 3.10 musllinux: musl 1.1+ x86-64

singler-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded Jan 4, 2024 CPython 3.10 manylinux: glibc 2.17+ x86-64

singler-0.1.2-cp310-cp310-macosx_11_0_arm64.whl (126.3 kB view hashes)

Uploaded Jan 4, 2024 CPython 3.10 macOS 11.0+ ARM64

singler-0.1.2-cp310-cp310-macosx_10_9_x86_64.whl (143.2 kB view hashes)

Uploaded Jan 4, 2024 CPython 3.10 macOS 10.9+ x86-64

singler-0.1.2-cp39-cp39-musllinux_1_1_x86_64.whl (3.3 MB view hashes)

Uploaded Jan 4, 2024 CPython 3.9 musllinux: musl 1.1+ x86-64

singler-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view hashes)

Uploaded Jan 4, 2024 CPython 3.9 manylinux: glibc 2.17+ x86-64

singler-0.1.2-cp39-cp39-macosx_11_0_arm64.whl (126.3 kB view hashes)

Uploaded Jan 4, 2024 CPython 3.9 macOS 11.0+ ARM64

singler-0.1.2-cp39-cp39-macosx_10_9_x86_64.whl (143.2 kB view hashes)

Uploaded Jan 4, 2024 CPython 3.9 macOS 10.9+ x86-64

Hashes for singler-0.1.2.tar.gz

Hashes for singler-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`42258321d7b8be06da4a674b9997a650a5f2750bdc4921e570575600901ff443`
MD5	`b906c51007561d8cd7946c23375610c0`
BLAKE2b-256	`640003ac08f7105046695698837d1ee2d9f811de2c849fccbd177d6768ef848a`

Hashes for singler-0.1.2-cp311-cp311-musllinux_1_1_x86_64.whl

Hashes for singler-0.1.2-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`5f38475e9f6f8dc8b4609333db4e79465c422c15425dec8e368ff5fc39ecb545`
MD5	`8b0cfb8ef8eeb5a79d913cd94f41ae52`
BLAKE2b-256	`f56f114e633479ff75a28317141a201a1720771caaafa3103f719d5638ec3cc5`

Hashes for singler-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for singler-0.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`0890cc8efb16c1f08626ea186bd61292d25c8d2c35718b50b6a7c6b6d8984dc0`
MD5	`23014e5846fd8a3f71e2187515ed4962`
BLAKE2b-256	`9e4cd269be2e75b0688764746aa72ea9c8211154238e2973c6f9b463cbf60f6b`

Hashes for singler-0.1.2-cp311-cp311-macosx_11_0_arm64.whl

Hashes for singler-0.1.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`98ece8583261423331b5125cd6166bd538357697fb325d287999abd08dae83ff`
MD5	`072022709bcdd821ae7af882d4bbac5c`
BLAKE2b-256	`e39372ed8afcc988e9798c17de20e7cd4ccd85276d01969ebdc17e5a53aceee3`

Hashes for singler-0.1.2-cp311-cp311-macosx_10_9_x86_64.whl

Hashes for singler-0.1.2-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`113bdc423b9de9d6056c27c008a78085b80f4c78d82e3c0f943ad77226270d4f`
MD5	`a01ed29e27733ab579ffdbe6bc7748f0`
BLAKE2b-256	`bb7b807298195d53310d8d50c1293ff81ae1f74b57039ac68e40139074d84638`

Hashes for singler-0.1.2-cp310-cp310-musllinux_1_1_x86_64.whl

Hashes for singler-0.1.2-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`62f8c5e8dd2044beb1dd1ac8a14b9d7100a29b470bcf1795b34e90e09039f2e2`
MD5	`374fd76476b13a06b56807682c637d3f`
BLAKE2b-256	`0d018fe1f70701ce9e541527cbc482a96c3200a2dc0fc1436614d988225b2636`

Hashes for singler-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for singler-0.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e3ce48d63c3ecd1c940c93577f7253b705076c98440ec1c8b44bf1f3cb814169`
MD5	`d4e54dd3b8764c3617a8bbca5a25229b`
BLAKE2b-256	`09d2a5eec2f213ff40996f514b510e7a28444e41a0ed2f1e6dab15b549bb3c96`

Hashes for singler-0.1.2-cp310-cp310-macosx_11_0_arm64.whl

Hashes for singler-0.1.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`87b7aa313d7e5b3fe690e67ec096c4554417b7679cdfa1d1e1535fa402b3ea7d`
MD5	`b6a5cc884328822c98507dfc32d0324f`
BLAKE2b-256	`ed01ca84045cad8ea8e5f3c0591381ad47bc5fa891fb9cdd05ee3254a4de675e`

Hashes for singler-0.1.2-cp310-cp310-macosx_10_9_x86_64.whl

Hashes for singler-0.1.2-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`1c34794065b1819fe8ce354c0610c46d4d8721cdc9df4afb128414d019f78d61`
MD5	`1e355ec337a8e5992cf1729db1d914a8`
BLAKE2b-256	`029bfffeedba9b178356221710fbcbfbe5430656f57edd0f7282e8490e751d8e`

Hashes for singler-0.1.2-cp39-cp39-musllinux_1_1_x86_64.whl

Hashes for singler-0.1.2-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`c366414aab0ca32d3bea5c0a1a6019fc615035909e79a8087e60106f6cfc8e53`
MD5	`7fa66152974be60c763be0197bda31c0`
BLAKE2b-256	`4ae63c37c77908021fb2a00798fad4544455e6fa86b7e76199aacdbebef5748a`

Hashes for singler-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for singler-0.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`3699e5aa3b866caaa6089bf52a14cc4e7b5dd5cf1fe964e5e73cc0381f3ee9d0`
MD5	`85c8477166468d6fbfaff42b45be8bcf`
BLAKE2b-256	`0c3d0ddb79d2a49ceb5064c372deaa69ad6619fd52a369fc23ec078e4b32f884`

Hashes for singler-0.1.2-cp39-cp39-macosx_11_0_arm64.whl

Hashes for singler-0.1.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`00ffef380385a19c4d8e320d930d1ea49ac8c831562ebed0eb37b0c91775f564`
MD5	`b7211d8c2cf6d1e14074d2b6182f2f35`
BLAKE2b-256	`7206b5e47144f6e545986740e36e07ecce51c8a68766a721a2199a00c29424d1`

Hashes for singler-0.1.2-cp39-cp39-macosx_10_9_x86_64.whl

Hashes for singler-0.1.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`37c1f4e65d4b39bd1da8e5d4bcedfc12dcf42a26ba9628dba66233b952b08a75`
MD5	`56bbd1749649557c1990203898a43f49`
BLAKE2b-256	`9c1b95c1c81baf178e4c36db2969ed2116bb43cad1b3d292b29dd4f6d0ed31f7`