Cell type proportion analysis for single-cell and spatial transcriptomics data
Project description
CellProportion
A Python package for comparing cell type proportions between experimental groups in single-cell RNA-seq and spatial transcriptomics data.
🚀 Features
- Flexible Input: Works directly with AnnData objects or Pandas DataFrames
- Multiple Statistical Methods:
signed_r2– Signed R² from regression (captures direction + fit strength)mean_diff– Simple difference in mean proportionslog2_fc– Log₂ fold-change for multiplicative differencescorr– Pearson correlation with group labels
- Spatial Analysis: Compare proportions within tissue regions/spatial domains
- Statistical Testing: Mann-Whitney U tests with significance categorization
- Visualization Ready: Built-in colour schemes and customizable colour mapping
- Robust Error Handling: Comprehensive validation and informative warnings
📦 Installation
pip install cellproportion
🔧 Quick Start
Single-Cell Analysis
import pandas as pd
import scanpy as sc
from cellproportion import cell_type_abundance
# Load your single-cell data
adata = sc.read_h5ad("your_data.h5ad")
# Compare cell type proportions between conditions
results = cell_type_abundance(
adata, # AnnData object
annotation="cell_type", # Column with cell type labels
sample_types="condition", # Column with experimental conditions
sample_ID="patient_id", # Column with sample/patient IDs
sample_types_1="tumor", # First group for comparison
sample_types_2="normal", # Second group for comparison
method="signed_r2", # Statistical method
signed_r2_cutoff=0.15, # Optional: cutoff for significance
explain=True # Print method explanation
)
print(results.head())
Spatial Transcriptomics Analysis
from cellproportion.spatial import spatial_cell_type_abundance
# Analyze proportions within each spatial region
spatial_results = spatial_cell_type_abundance(
adata, # AnnData with spatial information
region_col="tissue_region", # Column with spatial region labels
annotation="cell_type",
sample_types="condition",
sample_ID="patient_id",
sample_types_1="tumor",
sample_types_2="normal",
method="signed_r2"
)
print(f"Analyzed {spatial_results['region'].nunique()} spatial regions")
print(spatial_results.head())
Using DataFrames
# Works with any DataFrame containing the required columns
metadata_df = pd.DataFrame({
'cell_type': ['T_cell', 'B_cell', 'Macrophage'] * 100,
'condition': ['tumor', 'normal'] * 150,
'patient_id': ['P1', 'P2', 'P3'] * 100,
# ... other columns
})
results = cell_type_abundance(
metadata_df,
annotation="cell_type",
sample_types="condition",
sample_ID="patient_id",
sample_types_1="tumor",
sample_types_2="normal"
)
📊 Understanding the Results
The output DataFrame contains:
- anno: Cell type annotation
- stat_values: Statistical metric value (depends on method chosen)
- p_values: P-value from Mann-Whitney U test
- sig_p: Significance category (
p<0.01,p<0.05,p<0.1,p<0.5,p>0.5) - colour: Colour code for visualization
# Example output
print(results[['anno', 'stat_values', 'p_values', 'sig_p']].head())
# anno stat_values p_values sig_p
# 0 T_cell 0.234567 0.0123 p<0.05
# 1 B_cell -0.123456 0.2341 p<0.5
# 2 NK_cell 0.456789 0.0001 p<0.01
🎨 Custom Colour Mapping
Create a TSV file with your preferred colours:
annotation colour
T_cell #E41A1C
B_cell #377EB8
NK_cell #4DAF4A
Macrophage #984EA3
results = cell_type_abundance(
adata,
# ... other parameters
colours_file="my_colours.tsv"
)
📈 Statistical Methods Explained
from cellproportion.methods import explain_stat_values
# Get detailed explanations of all methods
explanations = explain_stat_values()
for method, info in explanations.items():
print(f"\n{method.upper()}:")
for key, value in info.items():
print(f" {key}: {value}")
Method Comparison
| Method | Best For | Pros | Cons |
|---|---|---|---|
signed_r2 |
Linear relationships | Direction + fit strength | Assumes linearity |
mean_diff |
Simple comparisons | Easy interpretation | Ignores variance |
log2_fc |
Multiplicative changes | Ratio-based | Sensitive to low values |
corr |
Association strength | Scale-invariant | Only linear association |
🔬 Advanced Usage
Batch Processing Multiple Datasets
datasets = ["dataset1.h5ad", "dataset2.h5ad", "dataset3.h5ad"]
all_results = []
for dataset_path in datasets:
adata = sc.read_h5ad(dataset_path)
results = cell_type_abundance(adata, method="signed_r2")
results['dataset'] = dataset_path
all_results.append(results)
combined_results = pd.concat(all_results, ignore_index=True)
Method Comparison
methods = ["signed_r2", "mean_diff", "log2_fc", "corr"]
method_comparison = {}
for method in methods:
results = cell_type_abundance(adata, method=method)
method_comparison[method] = results
# Compare results across methods
comparison_df = pd.DataFrame({
method: method_comparison[method].set_index('anno')['stat_values']
for method in methods
})
📝 Citation
If you use CellProportion in your research, please cite:
@software{cellproportion2024,
author = {Patel, Ankit},
title = {CellProportion: Cell type proportion analysis for single-cell and spatial transcriptomics},
url = {https://github.com/avpatel18/cellproportion},
version = {1.0.0},
year = {2025}
}
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🐛 Bug Reports
If you encounter any bugs or have feature requests, please file an issue on GitHub Issues.
📧 Contact
- Author: Ankit Patel
- Email: ankit.patel@qmul.ac.uk
- GitHub: @avpatel18
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cellproportion-1.1.1.tar.gz.
File metadata
- Download URL: cellproportion-1.1.1.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e12c1fd2c91d9e3d1b4796d8a902f7938171d74d080062a41bf1a6469a0a7d07
|
|
| MD5 |
3f9d124b395aa8b616760e9870c1cde6
|
|
| BLAKE2b-256 |
cdcbad6f396fec67d29095165e8ecb664c658364480e40b9011d0895f04d0738
|
File details
Details for the file cellproportion-1.1.1-py3-none-any.whl.
File metadata
- Download URL: cellproportion-1.1.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f8ee4ba93ce14357d1c4cd72094b9b5fcb516de9c5507086a8ca135ffcd206b
|
|
| MD5 |
2aa835735304f67497e8644f8351982d
|
|
| BLAKE2b-256 |
a0d88130ed82e2f325b66efe270ef2c4e00d9e5c2f511cc6292238a34149bab7
|