Skip to main content

A package that efficiently computes p-values for a given set of genes based on input matrices representing cell coordinates and gene expression data

Project description

\n# scBSP - A Fast Tool for Single-Cell Spatially Variable Genes Identifications on Large-Scale Spatially Resolved Transcriptomics Data

DOI

This package utilizes a granularity-based dimension-agnostic tool, single-cell big-small patch (scBSP), implementing sparse matrix operation and KD-tree/balltree method for distance calculation, for the identification of spatially variable genes on large-scale data.

Installation

Dependencies

To ensure scBSP functions optimally, the following dependencies are required:

  • Python (>= 3.8)
  • NumPy (>= 1.24.4)
  • Pandas (>= 1.3.5)
  • SciPy (>= 1.10.1)
  • scikit-learn (>=1.3.2)

Installation Commands

For Standard Installation (Using Ball Tree):

pip install "scbsp"

For Installation with GPU:

pip install "scbsp[gpu]"

Usage

Basic Usage

To use scBSP, you need to provide two primary inputs:

  1. Cell Coordinates Matrix (input_sp_mat):

    • Format: Numpy array.
    • Dimensions: N x D, where N is the number of cells and D is the dimension of coordinates.
  2. Gene Expression Matrix (input_exp_mat_raw):

    • Format: Numpy array, Pandas DataFrame, or CSR matrix.
    • Dimensions: N x P, where N is the number of cells and P is the number of genes.

Additional parameters to specify include:

  • d1: A floating-point number. Default value is 1.0.
  • d2: A floating-point number. Default value is 3.0.
  • leaf_size: Optional integer defining the maximum point threshold for the Ball Tree algorithm to revert to brute-force search (default = 80).
  • use_gpu: Optional boolean defining whether to use the GPU (default = False).

Example

Below is a straightforward example showcasing how to compute p-values with scBSP:

import scbsp

# Load your data into these variables
input_sp_mat = ...  # Cell Coordinates Matrix
input_exp_mat_raw = ...  # Gene Expression Matrix

# Set the optional parameters
d1 = 1.0
d2 = 3.0

# Compute p-values
p_values = scbsp.granp(input_sp_mat, input_exp_mat_raw, d1, d2)

Combining P-values Across Multiple Samples

When you have multiple samples or datasets and want to combine their p-values to identify consistently significant genes, you can use the combine_p_values function:

import scbsp
import pandas as pd

# Assume you have p-values from three different samples
sample1_pvalues = scbsp.granp(sp_mat1, exp_mat1)
sample2_pvalues = scbsp.granp(sp_mat2, exp_mat2)
sample3_pvalues = scbsp.granp(sp_mat3, exp_mat3)

# Combine p-values using Fisher's method (default)
combined_results = scbsp.combine_p_values(
    [sample1_pvalues, sample2_pvalues, sample3_pvalues],
    method="fisher"
)

# Or use Stouffer's method
combined_results_stouffer = scbsp.combine_p_values(
    [sample1_pvalues, sample2_pvalues, sample3_pvalues],
    method="stouffer"
)

The combine_p_values function supports two methods:

  • Fisher's method: Combines p-values using Fisher's combined probability test (default)
  • Stouffer's method: Combines p-values using Stouffer's Z-score method

Output

granp Function Output

The granp function returns a Pandas DataFrame with two columns:

  • gene_names: The identifier for each gene
  • p_values: The p-value quantifying the statistical significance of spatial variability for each gene

combine_p_values Function Output

The combine_p_values function returns a Pandas DataFrame with three columns:

  • gene_names: The identifier for each gene
  • number_samples: The number of samples/datasets where each gene was present
  • calibrated_p_values: The combined p-value across samples using the specified method

Each row in these DataFrames represents a unique gene from the input gene expression matrix. This structured format enhances the ease of conducting sophisticated biological analyses, allowing for straightforward identification and investigation of genes with significant expression variability.

Reference

  • Li, Jinpu, Yiqing Wang, Mauminah Azam Raina, Chunhui Xu, Li Su, Qi Guo, Qin Ma, Juexin Wang, and Dong Xu. "scBSP: A fast and accurate tool for identifying spatially variable genes from spatial transcriptomic data." bioRxiv (2024).

  • Wang, Juexin, Jinpu Li, Skyler T. Kramer, Li Su, Yuzhou Chang, Chunhui Xu, Michael T. Eadon, Krzysztof Kiryluk, Qin Ma, and Dong Xu. "Dimension-agnostic and granularity-based spatially variable gene identification using BSP." Nature Communications 14, no. 1 (2023): 7367.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scbsp-0.3.1.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scbsp-0.3.1-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file scbsp-0.3.1.tar.gz.

File metadata

  • Download URL: scbsp-0.3.1.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for scbsp-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9ddce74b287b8ad6181e00365dc82144dbe244494eace93bd9bb94ac679f3751
MD5 e846fe7fbadb4149899c5ec2c9faac0b
BLAKE2b-256 bd017115723dbb39bb4388a800307fb6181b19dd8dd85afc4426aaff030166e7

See more details on using hashes here.

File details

Details for the file scbsp-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: scbsp-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for scbsp-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ffb58912bd85a11db9263b8a8940528212362c4fcd352f1740de4bd5e9d31b45
MD5 a118641fc9ba4bf5300bdf3b298e86d3
BLAKE2b-256 b5c36701abccaca0997636bbc8a190711294f09d8e8d46f7f821b5fc9e2bc40e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page