Skip to main content

Python bindings to the singleR algorithm to annotate cell types from known references.

Project description

Project generated with PyScaffold PyPI-Server Monthly Downloads Unit tests

Tinder for single-cell data

Overview

This package provides Python bindings to the C++ implementation of the SingleR algorithm, originally developed by Aran et al. (2019). It is designed to annotate cell types by matching cells to known references based on their expression profiles. So kind of like Tinder, but for cells.

Quick start

Firstly, let's load in the famous PBMC 4k dataset from 10X Genomics:

import singlecellexperiment as sce
data = sce.read_tenx_h5("pbmc4k-tenx.h5", realize_assays=True)
mat = data.assay("counts")
features = [str(x) for x in data.row_data["name"]]

or if you are coming from scverse ecosystem, i.e. AnnData, simply read the object as SingleCellExperiment and extract the matrix and the features. Read more on SingleCellExperiment here.

import singlecellexperiment as sce

sce_adata = sce.SingleCellExperiment.from_anndata(adata) 

# or from a h5ad file
sce_h5ad = sce.read_h5ad("tests/data/adata.h5ad")

Now, we fetch the Blueprint/ENCODE reference:

import celldex

ref_data = celldex.fetch_reference("blueprint_encode", "2024-02-26", realize_assays=True)

We can annotate each cell in mat with the reference:

import singler
results = singler.annotate_single(
    test_data = mat,
    test_features = features,
    ref_data = ref_data,
    ref_labels = "label.main",
)

The results data frame contains all of the assignments and the scores for each label:

results.column("best")
## ['Monocytes',
##  'Monocytes',
##  'Monocytes',
##  'CD8+ T-cells',
##  'CD4+ T-cells',
##  'CD8+ T-cells',
##  'Monocytes',
##  'Monocytes',
##  'B-cells',
##  ...
## ]

results.column("scores").column("Macrophages")
## array([0.35935275, 0.40833545, 0.37430726, ..., 0.32135929, 0.29728435,
##        0.40208581])

Calling low-level functions

The annotate_single() function is a convenient wrapper around a number of lower-level functions in singler. Advanced users may prefer to build the reference and run the classification separately. This allows us to re-use the same reference for multiple datasets without repeating the build step.

built = singler.build_single_reference(
    ref_data=ref_data.assay("logcounts"),
    ref_labels=ref_data.col_data.column("label.main"),
    ref_features=ref_data.get_row_names(),
    restrict_to=features,
)

And finally, we apply the pre-built reference to the test dataset to obtain our label assignments. This can be repeated with different datasets that have the same features or a superset of features.

output = singler.classify_single_reference(
    mat,
    test_features=features,
    ref_prebuilt=built,
)
## output
BiocFrame with 4340 rows and 3 columns
            best                                   scores                delta
        <list>                              <BiocFrame>   <ndarray[float64]>
[0] Monocytes 0.33265560369962943:0.407117403330602...  0.40706830113982534
[1] Monocytes 0.4078771641637374:0.4783396310685646...  0.07000418564184802
[2] Monocytes 0.3517036021728629:0.4076971245524348...  0.30997293412307647
            ...                                      ...                  ...
[4337]  NK cells 0.3472631136865701:0.3937898240670208...  0.09640242155786138
[4338]   B-cells 0.26974632191999887:0.334862058137758... 0.061215905058676856
[4339] Monocytes 0.39390119034537324:0.468867490667427...  0.06678168346812047

Integrating labels across references

We can use annotations from multiple references through the annotate_integrated() function:

import singler
import celldex

blueprint_ref = celldex.fetch_reference("blueprint_encode", "2024-02-26", realize_assays=True)

immune_cell_ref = celldex.fetch_reference("dice", "2024-02-26", realize_assays=True)

single_results, integrated = singler.annotate_integrated(
    mat,
    features,
    ref_data_list = (blueprint_ref, immune_cell_ref),
    ref_labels_list = "label.main",
    num_threads = 6
)

This annotates the test dataset against each reference individually to obtain the best per-reference label, and then it compares across references to find the best label from all references. Both the single and integrated annotations are reported for diagnostics.

integrated.column("best_label")
## ['Monocytes', 
##  'Monocytes',
##  'Monocytes',
##  'CD8+ T-cells',
##  'CD4+ T-cells',
##  'CD8+ T-cells',
##  'Monocytes',
##  'Monocytes',
##  ...
## ]

integrated.column("best_reference")
## ['Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## 'Blueprint',
## ...
##]

Developer notes

Build the shared object file:

python setup.py build_ext --inplace

For quick testing:

pytest

For more complex testing:

python setup.py build_ext --inplace && tox

To rebuild the ctypes bindings with cpptypes:

cpptypes src/singler/lib --py src/singler/_cpphelpers.py --cpp src/singler/lib/bindings.cpp --dll _core

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singler-0.3.0.tar.gz (44.6 kB view details)

Uploaded Source

Built Distributions

singler-0.3.0-cp311-cp311-musllinux_1_1_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

singler-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

singler-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (123.4 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

singler-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl (140.4 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

singler-0.3.0-cp310-cp310-musllinux_1_1_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

singler-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

singler-0.3.0-cp310-cp310-macosx_11_0_arm64.whl (123.4 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

singler-0.3.0-cp310-cp310-macosx_10_9_x86_64.whl (140.4 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

singler-0.3.0-cp39-cp39-musllinux_1_1_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

singler-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

singler-0.3.0-cp39-cp39-macosx_11_0_arm64.whl (123.4 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

singler-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl (140.4 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file singler-0.3.0.tar.gz.

File metadata

  • Download URL: singler-0.3.0.tar.gz
  • Upload date:
  • Size: 44.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.9

File hashes

Hashes for singler-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d32674b9047c709df78e81fdf2f4f99a2f25150341b178f73cb4b6568f4c6d43
MD5 520e6f637c2b77145cb074303c6a23bb
BLAKE2b-256 41e36c4bab922f58e19ac6c75de422dd0612a907766d4fdc151e533ff9ff3477

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 8e4277171896ce83f5f084aff02171e2ee59882128604de196841bc0a6af666a
MD5 f89f85cfb2754795b8d8718d1f7fe63c
BLAKE2b-256 176c2f0c882d5451dc44ce85d27ba46386c7ca5e5efb4d84783313d06e94c39d

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c7771c25a192073e6e962a3a2787a96a9eb75aa4f11336dfc639b6cadf720ff4
MD5 1002bcaa3062429d7b41e37959f930d8
BLAKE2b-256 54a2062695cb4bb0aeff2f6a58c83ce67f2cdffbc9345a7a8e3ca9672b4fb6de

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1ce23a866864495dee6a568a1cf2a81948eb7fa8896e9dbf53b75e0c235df071
MD5 4fcdc462947374c2853f8cd8264e2ab4
BLAKE2b-256 393cec799077ba73467bdf3d552bb87ff92b871b03d6b77ab377c5de485f664d

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e3c093e814bc255a23578a3c79ffded38329aac5a003e3b3af40350f0c76e361
MD5 1eef209cd62b19fef957e6dfc8830309
BLAKE2b-256 9715fbb297139a9fb54abbee6bf20173af8d1413b77c6d433cec2d6fb393ba59

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 697144764ea78d78c72641c79d4ec90ce1837bb23ad9b6320f9bf50c297913ba
MD5 58ae2ecc6ee550e96035282cccf4501c
BLAKE2b-256 7549044317f466cb62d7d8b404554992b4fb6d8e38e43caa2adb2f272341c65f

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cd25d3b77191d078381db7f1cc3db95214f26d265d3a281314be8fe8e24c43e7
MD5 5ab989d03e028945ac207bf0e4185ca0
BLAKE2b-256 c4df4951654e16177bea4b2bcc3d41fa6251b3b03e8c67c470ed8ec3a44b7943

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f92008b4f566e56cf2c1b5edee8e099b31d27629b0ece4edb98fb8bef484b0e3
MD5 d4476344d700b85b37f4ce0773c6f907
BLAKE2b-256 46f9c761579b7a491d8b6a2a25da04a2c788ff07e1e27f89dd007bdac89f679f

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 477a29a9e55b4dda94817bdd582ab1f15a2f4366f56bc020566174f1011d177a
MD5 ac4477198c7080f4c952389b10c94066
BLAKE2b-256 b30b6df176d1d190a0b3730b5a263ae4f5143844d2929840c8b01bc31ec975f2

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d4e9cdde3b995b75ef524b32b27e4d3a15944288abd7059042df9551c83f96b9
MD5 4dc723617b36757310eeb8f2ff6b3472
BLAKE2b-256 023f7a701cf55e97f66a6edeed1dd7f80d38f3d1b241039d40733fb06a7486a9

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ceda4a1cc746afd0c8570d478f2d1b2291c7a0d9b691ab30ffff7e1ef4d27564
MD5 796610a8c52b9d65af7f5226568513f9
BLAKE2b-256 52aeee46eb92825d8b92f12a94040d3b282317c4818c48f6bf542fba02e7ff6f

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0270aa31b2a04cddbf41f460ce4c7cdd7fd0400cac67b5cc04e49917a93ad6ea
MD5 9b36a74334e309d361e32dd1d2861842
BLAKE2b-256 80099cc4fb8b8ad51135b2941c3c42c5c0134f026d7be45404949bfa748f0193

See more details on using hashes here.

File details

Details for the file singler-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for singler-0.3.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7c546eca0008cf19a2830bc81c1733784e2c037d98fbc9c975cca19c3eee96c3
MD5 6935fe72a2e44168f251a59f4685953e
BLAKE2b-256 72e8acf31c88dbdfde6a802d84446544c31e730b8da0e0ba8eb99bddff937d70

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page