Skip to main content

Secreted Protein Activity Inference using Ridge Regression

Project description

SecActPy

Secreted Protein Activity Inference using Ridge Regression

Python 3.9+ License: MIT

SecActPy is a Python package for inferring secreted protein (e.g. cytokine/chemokine) activity from gene expression data using ridge regression with permutation-based significance testing.

Key Features:

  • 🎯 SecAct Compatible: Produces identical results to the R SecAct/RidgeR package
  • 🚀 GPU Acceleration: Optional CuPy backend for large-scale analysis
  • 📊 Million-Sample Scale: Batch processing with streaming output for massive datasets
  • 🔬 Built-in Signatures: Includes SecAct and CytoSig signature matrices
  • 🧬 Multi-Platform Support: Bulk RNA-seq, scRNA-seq, and Spatial Transcriptomics (Visium, CosMx)
  • 💾 Smart Caching: Optional permutation table caching for faster repeated analyses
  • 🧮 Sparse-Aware: Automatic memory-efficient processing for sparse single-cell data

Installation

CPU Only

pip install git+https://github.com/psychemistz/SecActPy.git

With GPU Support (CUDA 11.x)

pip install "secactpy[gpu] @ git+https://github.com/psychemistz/SecActPy.git"

Note: For CUDA 12.x, install CuPy separately: pip install cupy-cuda12x

Development Installation

git clone https://github.com/psychemistz/SecActPy.git
cd SecActPy
pip install -e ".[dev]"

Quick Start

Basic Usage (Bulk RNA-seq)

import pandas as pd
from secactpy import secact_activity_inference

# Load your differential expression data (genes × samples)
diff_expr = pd.read_csv("diff_expression.csv", index_col=0)

# Run inference
result = secact_activity_inference(
    diff_expr,
    is_differential=True,
    sig_matrix="secact",  # or "cytosig"
    verbose=True
)

# Access results
activity = result['zscore']    # Activity z-scores
pvalues = result['pvalue']     # P-values
coefficients = result['beta']  # Regression coefficients

Spatial Transcriptomics (10X Visium)

from secactpy import secact_activity_inference_st

# Spot-level analysis
result = secact_activity_inference_st(
    "path/to/visium_folder/",
    min_genes=1000,
    verbose=True
)

activity = result['zscore']  # (proteins × spots)

Spatial Transcriptomics with Cell Type Resolution

import anndata as ad
from secactpy import secact_activity_inference_st

# Load annotated spatial data
adata = ad.read_h5ad("spatial_annotated.h5ad")

# Cell-type resolution (pseudo-bulk by cell type)
result = secact_activity_inference_st(
    adata,
    cell_type_col="cell_type",  # Column in adata.obs
    is_spot_level=False,        # Aggregate by cell type
    verbose=True
)

activity = result['zscore']  # (proteins × cell_types)

scRNA-seq Analysis

import anndata as ad
from secactpy import secact_activity_inference_scrnaseq

adata = ad.read_h5ad("scrnaseq_data.h5ad")

# Pseudo-bulk by cell type
result = secact_activity_inference_scrnaseq(
    adata,
    cell_type_col="cell_type",
    is_single_cell_level=False,
    verbose=True
)

# Single-cell level
result_sc = secact_activity_inference_scrnaseq(
    adata,
    cell_type_col="cell_type",
    is_single_cell_level=True,
    verbose=True
)

Large-Scale Batch Processing

from secactpy import ridge_batch

# Dense data (pre-scaled)
Y_scaled = (Y - Y.mean(axis=0)) / Y.std(axis=0, ddof=1)
result = ridge_batch(
    X, Y_scaled,
    batch_size=5000,
    n_rand=1000,
    backend='cupy',  # Use GPU
    verbose=True
)

# Sparse data (auto-scaled internally)
import scipy.sparse as sp
Y_sparse = sp.csr_matrix(counts)  # Raw counts
result = ridge_batch(
    X, Y_sparse,
    batch_size=10000,
    n_rand=1000,
    backend='auto',
    verbose=True
)

# Stream results to disk for very large datasets
ridge_batch(
    X, Y,
    batch_size=10000,
    output_path="results.h5ad",
    output_compression="gzip",
    verbose=True
)

API Reference

High-Level Functions

Function Description
secact_activity_inference() Bulk RNA-seq inference
secact_activity_inference_st() Spatial transcriptomics inference
secact_activity_inference_scrnaseq() scRNA-seq inference
load_signature(name='secact') Load built-in signature matrix

Core Functions

Function Description
ridge() Single-call ridge regression with permutation testing
ridge_batch() Batch processing for large datasets (dense or sparse)
estimate_batch_size() Estimate optimal batch size for available memory
estimate_memory() Estimate memory requirements

Key Parameters

Parameter Default Description
sig_matrix "secact" Signature: "secact", "cytosig", or DataFrame
lambda_ 5e5 Ridge regularization parameter
n_rand 1000 Number of permutations
seed 0 Random seed for reproducibility
backend 'auto' 'auto', 'numpy', or 'cupy'
use_cache False Cache permutation tables to disk

ST-Specific Parameters

Parameter Default Description
cell_type_col None Column in AnnData.obs for cell type
is_spot_level True If False, aggregate by cell type
scale_factor 1e5 Normalization scale factor

Batch Processing Parameters

Parameter Default Description
batch_size 5000 Samples per batch
output_path None Stream results to H5AD file
output_compression "gzip" Compression: "gzip", "lzf", or None

GPU Acceleration

from secactpy import secact_activity_inference, CUPY_AVAILABLE

print(f"GPU available: {CUPY_AVAILABLE}")

# Auto-detect GPU
result = secact_activity_inference(expression, backend='auto')

# Force GPU
result = secact_activity_inference(expression, backend='cupy')

Performance

Dataset R (Mac M1) R (Linux) Py (CPU) Py (GPU) Speedup
Bulk (1,170 sp × 1,000 samples) 74.4s 141.6s 128.8s 6.7s 11–19x
scRNA-seq (1,170 sp × 788 cells) 54.9s 117.4s 104.8s 6.8s 8–15x
Visium (1,170 sp × 3,404 spots) 141.7s 379.8s 381.4s 11.2s 13–34x
CosMx (151 sp × 443,515 cells) 936.9s 976.1s 1226.7s 99.9s 9–12x
Benchmark Environment
  • Mac CPU: M1 Pro with VECLIB (8 cores)
  • Linux CPU: AMD EPYC 7543P (4 cores)
  • Linux GPU: NVIDIA A100-SXM4-80GB

Reproducibility

SecActPy produces identical results to R SecAct/RidgeR:

result = secact_activity_inference(
    expression,
    is_differential=True,
    sig_matrix="secact",
    lambda_=5e5,
    n_rand=1000,
    seed=0,
    use_gsl_rng=True  # Default: R-compatible RNG
)

For faster inference when R compatibility is not needed:

result = secact_activity_inference(
    expression,
    use_gsl_rng=False,  # ~70x faster permutation generation
)

Requirements

  • Python ≥ 3.9
  • NumPy ≥ 1.20
  • Pandas ≥ 1.3
  • SciPy ≥ 1.7
  • h5py ≥ 3.0
  • anndata ≥ 0.8
  • scanpy ≥ 1.9

Optional: CuPy ≥ 10.0 (GPU acceleration)

Citation

If you use SecActPy in your research, please cite:

Beibei Ru, Lanqi Gong, Emily Yang, Seongyong Park, George Zaki, Kenneth Aldape, Lalage Wakefield, Peng Jiang. Inference of secreted protein activities in intercellular communication. [Link]

License

MIT License - see LICENSE for details.

Changelog

v0.1.2 (Initial Release)

  • Ridge regression with permutation-based significance testing
  • GPU acceleration via CuPy backend (9–34x speedup)
  • Batch processing with streaming H5AD output for million-sample datasets
  • Automatic sparse matrix handling in ridge_batch()
  • Built-in SecAct and CytoSig signature matrices
  • GSL-compatible RNG for R/RidgeR reproducibility
  • Support for Bulk RNA-seq, scRNA-seq, and Spatial Transcriptomics
  • Cell type resolution for ST data (cell_type_col, is_spot_level)
  • Optional permutation table caching (use_cache)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

secactpy-0.1.2.tar.gz (81.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

secactpy-0.1.2-py3-none-any.whl (81.7 MB view details)

Uploaded Python 3

File details

Details for the file secactpy-0.1.2.tar.gz.

File metadata

  • Download URL: secactpy-0.1.2.tar.gz
  • Upload date:
  • Size: 81.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for secactpy-0.1.2.tar.gz
Algorithm Hash digest
SHA256 068dc5506d7d51097e4a2772d055c69010d5156c46da6809b3d115e39fae42f5
MD5 a0bd60ac7b654e936a8a4fe7f149037a
BLAKE2b-256 9050209aca3d52dd0725344d430b3d15671498f9755d47070fb7f614ea912a50

See more details on using hashes here.

File details

Details for the file secactpy-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: secactpy-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 81.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for secactpy-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7ef8989b96179ab95d5a5d94f3d43c2bc6d411729d4b3e9e4ff0155eaeb339a5
MD5 217303941f7ad96ac441fc55dcfb4f81
BLAKE2b-256 d58ee83b374130e204f13198b39285d6a84f7e746769e77d70fc781e69816ed5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page