Skip to main content

Secreted Protein Activity Inference using Ridge Regression

Project description

SecActPy

Secreted Protein Activity Inference using Ridge Regression

Python 3.9+ License: MIT

SecActPy is a Python package for inferring secreted protein (e.g. cytokine/chemokine) activity from gene expression data using ridge regression with permutation-based significance testing.

Key Features:

  • 🎯 SecAct Compatible: Produces identical results to the R SecAct/RidgeR package
  • 🚀 GPU Acceleration: Optional CuPy backend for large-scale analysis
  • 📊 Million-Sample Scale: Batch processing with streaming output for massive datasets
  • 🔬 Built-in Signatures: Includes SecAct and CytoSig signature matrices
  • 🧬 Multi-Platform Support: Bulk RNA-seq, scRNA-seq, and Spatial Transcriptomics (Visium, CosMx)
  • 💾 Smart Caching: Optional permutation table caching for faster repeated analyses
  • 🧮 Sparse-Preserving: Memory-efficient processing for sparse single-cell data

Installation

CPU Only

pip install git+https://github.com/psychemistz/SecActPy.git

With GPU Support (CUDA 11.x)

pip install "secactpy[gpu] @ git+https://github.com/psychemistz/SecActPy.git"

Note: For CUDA 12.x, install CuPy separately: pip install cupy-cuda12x

Development Installation

git clone https://github.com/psychemistz/SecActPy.git
cd SecActPy
pip install -e ".[dev]"

Quick Start

Basic Usage (Bulk RNA-seq)

import pandas as pd
from secactpy import secact_activity_inference

# Load your differential expression data (genes × samples)
diff_expr = pd.read_csv("diff_expression.csv", index_col=0)

# Run inference
result = secact_activity_inference(
    diff_expr,
    is_differential=True,
    sig_matrix="secact",  # or "cytosig"
    verbose=True
)

# Access results
activity = result['zscore']    # Activity z-scores
pvalues = result['pvalue']     # P-values
coefficients = result['beta']  # Regression coefficients

Spatial Transcriptomics (10X Visium)

from secactpy import secact_activity_inference_st

# Spot-level analysis
result = secact_activity_inference_st(
    "path/to/visium_folder/",
    min_genes=1000,
    verbose=True
)

activity = result['zscore']  # (proteins × spots)

Spatial Transcriptomics with Cell Type Resolution

import anndata as ad
from secactpy import secact_activity_inference_st

# Load annotated spatial data
adata = ad.read_h5ad("spatial_annotated.h5ad")

# Cell-type resolution (pseudo-bulk by cell type)
result = secact_activity_inference_st(
    adata,
    cell_type_col="cell_type",  # Column in adata.obs
    is_spot_level=False,        # Aggregate by cell type
    verbose=True
)

activity = result['zscore']  # (proteins × cell_types)

scRNA-seq Analysis

import anndata as ad
from secactpy import secact_activity_inference_scrnaseq

adata = ad.read_h5ad("scrnaseq_data.h5ad")

# Pseudo-bulk by cell type
result = secact_activity_inference_scrnaseq(
    adata,
    cell_type_col="cell_type",
    is_single_cell_level=False,
    verbose=True
)

# Single-cell level
result_sc = secact_activity_inference_scrnaseq(
    adata,
    cell_type_col="cell_type",
    is_single_cell_level=True,
    verbose=True
)

Large-Scale Batch Processing

from secactpy import (
    ridge_batch,
    precompute_population_stats,
    precompute_projection_components,
    ridge_batch_sparse_preserving
)

# Standard batch processing
result = ridge_batch(
    X, Y,
    batch_size=5000,
    n_rand=1000,
    backend='cupy',  # Use GPU
    verbose=True
)

# Sparse-preserving for million-cell datasets
stats = precompute_population_stats(Y_sparse)
proj = precompute_projection_components(X, lambda_=5e5)

result = ridge_batch_sparse_preserving(
    proj, Y_sparse, stats,
    n_rand=1000,
    use_cache=True,
    verbose=True
)

API Reference

High-Level Functions

Function Description
secact_activity_inference() Bulk RNA-seq inference
secact_activity_inference_st() Spatial transcriptomics inference
secact_activity_inference_scrnaseq() scRNA-seq inference
load_signature(name='secact') Load built-in signature matrix

Key Parameters

Parameter Default Description
sig_matrix "secact" Signature: "secact", "cytosig", or DataFrame
lambda_ 5e5 Ridge regularization parameter
n_rand 1000 Number of permutations
seed 0 Random seed for reproducibility
backend 'auto' 'auto', 'numpy', or 'cupy'
use_cache False Cache permutation tables to disk

ST-Specific Parameters

Parameter Default Description
cell_type_col None Column in AnnData.obs for cell type
is_spot_level True If False, aggregate by cell type
scale_factor 1e5 Normalization scale factor

GPU Acceleration

from secactpy import secact_activity_inference, CUPY_AVAILABLE

print(f"GPU available: {CUPY_AVAILABLE}")

# Auto-detect GPU
result = secact_activity_inference(expression, backend='auto')

# Force GPU
result = secact_activity_inference(expression, backend='cupy')

Performance

Dataset CPU GPU Speedup
Bulk (1k samples) 1.5s 0.3s 5x
scRNA-seq (5k cells) 6.4s 1.2s 5.3x
ST (10k spots) 13.9s 2.5s 5.6x
CosMx (100k cells) 120s 18s 6.7x

Reproducibility

SecActPy produces identical results to R SecAct/RidgeR:

result = secact_activity_inference(
    expression,
    is_differential=True,
    sig_matrix="secact",
    lambda_=5e5,
    n_rand=1000,
    seed=0,
    use_gsl_rng=True  # Default: R-compatible RNG
)

For faster inference when R compatibility is not needed:

result = secact_activity_inference(
    expression,
    use_gsl_rng=False,  # ~70x faster permutation generation
)

Requirements

  • Python ≥ 3.9
  • NumPy ≥ 1.20
  • Pandas ≥ 1.3
  • SciPy ≥ 1.7
  • h5py ≥ 3.0
  • anndata ≥ 0.8
  • scanpy ≥ 1.9

Optional: CuPy ≥ 10.0 (GPU acceleration)

Citation

If you use SecActPy in your research, please cite:

Beibei Ru, Lanqi Gong, Emily Yang, Seongyong Park, George Zaki, Kenneth Aldape, Lalage Wakefield, Peng Jiang. Inference of secreted protein activities in intercellular communication. [Link]

License

MIT License - see LICENSE for details.

Changelog

v0.1.1

  • Added use_cache parameter to all inference functions (default: False)
  • Added cell type resolution for spatial transcriptomics (cell_type_col, is_spot_level)
  • Simplified installation (base includes all common dependencies)

v0.1.0

  • Initial release with bulk, scRNA-seq, and ST support
  • GPU acceleration, batch processing, sparse-preserving mode
  • GSL-compatible RNG for R/RidgeR reproducibility

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

secactpy-0.1.1.tar.gz (81.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

secactpy-0.1.1-py3-none-any.whl (81.7 MB view details)

Uploaded Python 3

File details

Details for the file secactpy-0.1.1.tar.gz.

File metadata

  • Download URL: secactpy-0.1.1.tar.gz
  • Upload date:
  • Size: 81.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for secactpy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c72851a56d3fe7c9bddfb1b6d703cbb52e029bcdd3634249d063518ab077bb9e
MD5 0bca5947399abfb4dcde75840c6e1aeb
BLAKE2b-256 f59fee561d2f8c7224e13270fc027975f64aed3360facd320150f1f813dcd579

See more details on using hashes here.

File details

Details for the file secactpy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: secactpy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 81.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for secactpy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 974220d83cb79ec2dd0e41709e42725c7aaaffa7847a9f24c4886e0fdaaeb797
MD5 6ba8a8ebb9f62e5de829b9edbf105aa0
BLAKE2b-256 3455f7f8bf3fb1b8ed306a57401879b283090b49fd11824ac7baabdc24a43e6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page