Skip to main content

Distribution-based sketching of single-cell samples

Project description

SketchKH

Distribution-Informed Sketching with Kernel Herding

Overview

We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented.

Sketching via KH Overview

Installation

Dependencies

  • Python >= 3.6 < 3.10, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib

You can install the package with pip by,

pip install sketchKH

Alternatively, you can clone the git repository by,

git clone https://github.com/CompCy-lab/SketchKH.git

Example usage

To perform sketching, first read in a preprocessed .h5ad adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a sample-set.

import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))

Then simply sketch your data with 500 cells per sample-set by,

# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample 
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------

# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)

# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sketchKH-0.1.2.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sketchKH-0.1.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file sketchKH-0.1.2.tar.gz.

File metadata

  • Download URL: sketchKH-0.1.2.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.20

File hashes

Hashes for sketchKH-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8af5fbec374931a919f52943920de742187914babf8bd371d88d12f08ead2ab3
MD5 0cccd6a0f9299c2e600375d2fa5e6233
BLAKE2b-256 9a7d70ef7ee53d6e81d019c62e46ea29df261a17fb2f2a78b8ba8fb1baa886be

See more details on using hashes here.

File details

Details for the file sketchKH-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: sketchKH-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.20

File hashes

Hashes for sketchKH-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bb6a345cd5779ab2acfcb15031b8507d00360de57ffbf65228054f84403118ba
MD5 1d6faf65c14ee967b42cee60c6417cd4
BLAKE2b-256 b59963a1a235069ebaa484e88e0f72430f9d527f8d076b13274299b55f1ec4e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page