Distribution-based sketching of single-cell samples
Project description
SketchKH
Distribution-Informed Sketching with Kernel Herding
Overview
We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented.
- Please see our paper for more information (ACM-BCB 2022) : https://arxiv.org/abs/2207.00584
- Updated : December 27, 2024
Installation
Dependencies
- Python >= 3.6 < 3.10, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib
You can install the package with pip by,
pip install sketchKH
Alternatively, you can clone the git repository by,
git clone https://github.com/CompCy-lab/SketchKH.git
Example usage
To perform sketching, first read in a preprocessed .h5ad adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a sample-set.
import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))
Then simply sketch your data with 500 cells per sample-set by,
# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------
# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)
# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sketchKH-0.1.2.tar.gz.
File metadata
- Download URL: sketchKH-0.1.2.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8af5fbec374931a919f52943920de742187914babf8bd371d88d12f08ead2ab3
|
|
| MD5 |
0cccd6a0f9299c2e600375d2fa5e6233
|
|
| BLAKE2b-256 |
9a7d70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be
|
File details
Details for the file sketchKH-0.1.2-py3-none-any.whl.
File metadata
- Download URL: sketchKH-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb6a345cd5779ab2acfcb15031b8507d00360de57ffbf65228054f84403118ba
|
|
| MD5 |
1d6faf65c14ee967b42cee60c6417cd4
|
|
| BLAKE2b-256 |
b59963a1a235069ebaa484e88e0f72430f9d527f8d076b13274299b55f1ec4e7
|