Secreted Protein Activity Inference using Ridge Regression
Project description
SecActPy
Secreted Protein Activity Inference using Ridge Regression
SecActPy is a Python package for inferring secreted protein (e.g. cytokine/chemokine) activity from gene expression data using ridge regression with permutation-based significance testing.
Key Features:
- 🎯 SecAct Compatible: Produces identical results to the R SecAct/RidgeR package
- 🚀 GPU Acceleration: Optional CuPy backend for large-scale analysis
- 📊 Million-Sample Scale: Batch processing with streaming output for massive datasets
- 🔬 Built-in Signatures: Includes SecAct and CytoSig signature matrices
- 🧬 Multi-Platform Support: Bulk RNA-seq, scRNA-seq, and Spatial Transcriptomics (Visium, CosMx)
- 💾 Smart Caching: Optional permutation table caching for faster repeated analyses
- 🧮 Sparse-Preserving: Memory-efficient processing for sparse single-cell data
Installation
CPU Only
pip install git+https://github.com/psychemistz/SecActPy.git
With GPU Support (CUDA 11.x)
pip install "secactpy[gpu] @ git+https://github.com/psychemistz/SecActPy.git"
Note: For CUDA 12.x, install CuPy separately:
pip install cupy-cuda12x
Development Installation
git clone https://github.com/psychemistz/SecActPy.git
cd SecActPy
pip install -e ".[dev]"
Quick Start
Basic Usage (Bulk RNA-seq)
import pandas as pd
from secactpy import secact_activity_inference
# Load your differential expression data (genes × samples)
diff_expr = pd.read_csv("diff_expression.csv", index_col=0)
# Run inference
result = secact_activity_inference(
diff_expr,
is_differential=True,
sig_matrix="secact", # or "cytosig"
verbose=True
)
# Access results
activity = result['zscore'] # Activity z-scores
pvalues = result['pvalue'] # P-values
coefficients = result['beta'] # Regression coefficients
Spatial Transcriptomics (10X Visium)
from secactpy import secact_activity_inference_st
# Spot-level analysis
result = secact_activity_inference_st(
"path/to/visium_folder/",
min_genes=1000,
verbose=True
)
activity = result['zscore'] # (proteins × spots)
Spatial Transcriptomics with Cell Type Resolution
import anndata as ad
from secactpy import secact_activity_inference_st
# Load annotated spatial data
adata = ad.read_h5ad("spatial_annotated.h5ad")
# Cell-type resolution (pseudo-bulk by cell type)
result = secact_activity_inference_st(
adata,
cell_type_col="cell_type", # Column in adata.obs
is_spot_level=False, # Aggregate by cell type
verbose=True
)
activity = result['zscore'] # (proteins × cell_types)
scRNA-seq Analysis
import anndata as ad
from secactpy import secact_activity_inference_scrnaseq
adata = ad.read_h5ad("scrnaseq_data.h5ad")
# Pseudo-bulk by cell type
result = secact_activity_inference_scrnaseq(
adata,
cell_type_col="cell_type",
is_single_cell_level=False,
verbose=True
)
# Single-cell level
result_sc = secact_activity_inference_scrnaseq(
adata,
cell_type_col="cell_type",
is_single_cell_level=True,
verbose=True
)
Large-Scale Batch Processing
from secactpy import (
ridge_batch,
precompute_population_stats,
precompute_projection_components,
ridge_batch_sparse_preserving
)
# Standard batch processing
result = ridge_batch(
X, Y,
batch_size=5000,
n_rand=1000,
backend='cupy', # Use GPU
verbose=True
)
# Sparse-preserving for million-cell datasets
stats = precompute_population_stats(Y_sparse)
proj = precompute_projection_components(X, lambda_=5e5)
result = ridge_batch_sparse_preserving(
proj, Y_sparse, stats,
n_rand=1000,
use_cache=True,
verbose=True
)
API Reference
High-Level Functions
| Function | Description |
|---|---|
secact_activity_inference() |
Bulk RNA-seq inference |
secact_activity_inference_st() |
Spatial transcriptomics inference |
secact_activity_inference_scrnaseq() |
scRNA-seq inference |
load_signature(name='secact') |
Load built-in signature matrix |
Key Parameters
| Parameter | Default | Description |
|---|---|---|
sig_matrix |
"secact" |
Signature: "secact", "cytosig", or DataFrame |
lambda_ |
5e5 |
Ridge regularization parameter |
n_rand |
1000 |
Number of permutations |
seed |
0 |
Random seed for reproducibility |
backend |
'auto' |
'auto', 'numpy', or 'cupy' |
use_cache |
False |
Cache permutation tables to disk |
ST-Specific Parameters
| Parameter | Default | Description |
|---|---|---|
cell_type_col |
None |
Column in AnnData.obs for cell type |
is_spot_level |
True |
If False, aggregate by cell type |
scale_factor |
1e5 |
Normalization scale factor |
GPU Acceleration
from secactpy import secact_activity_inference, CUPY_AVAILABLE
print(f"GPU available: {CUPY_AVAILABLE}")
# Auto-detect GPU
result = secact_activity_inference(expression, backend='auto')
# Force GPU
result = secact_activity_inference(expression, backend='cupy')
Performance
| Dataset | CPU | GPU | Speedup |
|---|---|---|---|
| Bulk (1k samples) | 1.5s | 0.3s | 5x |
| scRNA-seq (5k cells) | 6.4s | 1.2s | 5.3x |
| ST (10k spots) | 13.9s | 2.5s | 5.6x |
| CosMx (100k cells) | 120s | 18s | 6.7x |
Reproducibility
SecActPy produces identical results to R SecAct/RidgeR:
result = secact_activity_inference(
expression,
is_differential=True,
sig_matrix="secact",
lambda_=5e5,
n_rand=1000,
seed=0,
use_gsl_rng=True # Default: R-compatible RNG
)
For faster inference when R compatibility is not needed:
result = secact_activity_inference(
expression,
use_gsl_rng=False, # ~70x faster permutation generation
)
Requirements
- Python ≥ 3.9
- NumPy ≥ 1.20
- Pandas ≥ 1.3
- SciPy ≥ 1.7
- h5py ≥ 3.0
- anndata ≥ 0.8
- scanpy ≥ 1.9
Optional: CuPy ≥ 10.0 (GPU acceleration)
Citation
If you use SecActPy in your research, please cite:
Beibei Ru, Lanqi Gong, Emily Yang, Seongyong Park, George Zaki, Kenneth Aldape, Lalage Wakefield, Peng Jiang. Inference of secreted protein activities in intercellular communication. [Link]
License
MIT License - see LICENSE for details.
Changelog
v0.1.1
- Added
use_cacheparameter to all inference functions (default:False) - Added cell type resolution for spatial transcriptomics (
cell_type_col,is_spot_level) - Simplified installation (base includes all common dependencies)
v0.1.0
- Initial release with bulk, scRNA-seq, and ST support
- GPU acceleration, batch processing, sparse-preserving mode
- GSL-compatible RNG for R/RidgeR reproducibility
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file secactpy-0.1.1.tar.gz.
File metadata
- Download URL: secactpy-0.1.1.tar.gz
- Upload date:
- Size: 81.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c72851a56d3fe7c9bddfb1b6d703cbb52e029bcdd3634249d063518ab077bb9e
|
|
| MD5 |
0bca5947399abfb4dcde75840c6e1aeb
|
|
| BLAKE2b-256 |
f59fee561d2f8c7224e13270fc027975f64aed3360facd320150f1f813dcd579
|
File details
Details for the file secactpy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: secactpy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 81.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
974220d83cb79ec2dd0e41709e42725c7aaaffa7847a9f24c4886e0fdaaeb797
|
|
| MD5 |
6ba8a8ebb9f62e5de829b9edbf105aa0
|
|
| BLAKE2b-256 |
3455f7f8bf3fb1b8ed306a57401879b283090b49fd11824ac7baabdc24a43e6c
|