Skip to main content

Distribution-based sketching of single-cell samples

Project description

SketchKH

Distribution-Informed Sketching with Kernel Herding

Overview

We provide a set of functions for distribution-aware sketching of multiple profiled single-cell samples via Kernel Herding. Our sketches select a small, representative set of cells from each profiled sample so that all major immune cell-types and their relative frequencies are well-represented.

Sketching via KH Overview

Installation

Dependencies

  • Python >= 3.6, anndata >= 0.7.6, numpy >= 1.22.4, scipy >= 1.7.1, numba, joblib, tqdm_joblib

You can install the package with pip by,

pip install sketchKH

Alternatively, you can clone the git repository by,

git clone https://github.com/CompCy-lab/SketchKH.git

Example usage

To perform sketching, first read in a preprocessed .h5ad adata object. This dataset contains multiple profiled single-cell samples. Hence, sketches will select a limited set of cells from each profiled sample. We refer to each profiled sample as a sample-set.

import anndata
import os
adata = anndata.read_h5ad(os.path.join('data', 'nk_cell_preprocessed.h5ad'))

Then simply sketch your data with 500 cells per sample-set by,

# Inputs
# adata: annotated data object (dimensions = cells x features)
# sample_set_key: string referring to the key within adata.obs that contains the sample-sets to subsample
# sample_set_inds: (alternative to specifying sample_set_key) list of arrays containing the indices of the sample-sets to subsample 
# gamma: scale parameter for the normal distribution standard deviation in random Fourier frequency feature computation
# frequency_seed: random state
# num_subsamples: number of cells to subsample per sample-set
# n_jobs: number of tasks
# ----------------------------

# Returns:
# kh_indices: list of indices referencing the subsampled cells per sample-set
# adata_subsample: downsampled annotated data object (dimensions = num_subsamples*sample-sets x features)

# ----------------------------
from sketchKH import *
kh_indices, adata_subsample = sketch(adata, sample_set_key = 'FCS_File', gamma = 1, num_subsamples = 500, frequency_seed = 0, n_jobs = -1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sketchkh-0.1.3.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sketchkh-0.1.3-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file sketchkh-0.1.3.tar.gz.

File metadata

  • Download URL: sketchkh-0.1.3.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sketchkh-0.1.3.tar.gz
Algorithm Hash digest
SHA256 48fc5c0843c06701d8ab18b02c7839a8128d49949af4483546ef1900fac7a0d5
MD5 e59e833d75f5a3bb338143b11284933a
BLAKE2b-256 9a037961a9d14123e520a82514520570c2d3dd201f908f3b53cdcc9adbdd35fa

See more details on using hashes here.

File details

Details for the file sketchkh-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: sketchkh-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sketchkh-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 da38d4a14d88805e41e1e7fa39e503ab0003620cf3741bb9a9a0e71f10f29dac
MD5 11dfea24457fb92e8b1ea27fde03c1e3
BLAKE2b-256 272828cc26bd6ac6169b420df86783e77343a8fc0fb126df5fd8f12476712bf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page