TargetSage: computational pipeline for CRISPR/RNAi screening analysis
Project description
TargetSage
Computational pipeline for CRISPR/RNAi screening analysis — arrayed and pooled.
TargetSage is a Python library and command-line toolkit for analyzing high-throughput CRISPR and RNAi screening data. It supports both:
- Arrayed screens — well-level readouts from plate readers (384/96-well format)
- Pooled screens — guide-level readouts from sequencing (dropout/enrichment)
The pipeline covers the full analysis workflow: data validation → normalization → QC → hit calling → gene aggregation → pathway enrichment → visualization.
📦 Installation
pip install targetsage
Requirements: Python ≥3.11
Optional dependencies for full functionality:
pip install targetsage[notebooks] # Jupyter support
pip install targetsage[plots] # Enhanced plotting backends
🚀 Quick Start
Arrayed Screen — One Function
import pandas as pd
from targetsage.stats.batch_correction import run_arrayed_analysis
df = pd.read_csv("my_arrayed_screen.csv")
results = run_arrayed_analysis(
df,
rep1_col="Raw_rep1",
rep2_col="Raw_rep2",
plate_col="Plate",
control_type_col="SpotType",
norm_method="genes and all Non-targeting",
p_cutoff=0.05,
lfc_cutoff=1.0,
)
# results is a DataFrame with well-level hits, fold-changes, SSMD, p-values
hits = results[results["is_hit"] == True]
print(f"Found {len(hits)} hits")
Arrayed Screen — Step-by-Step Workflow
from targetsage.pipeline.arrayed_workflow import ArrayedScreenWorkflow
wf = ArrayedScreenWorkflow(
df,
screen_name="my_screen",
rep_cols=["Raw_rep1", "Raw_rep2"],
plate_col="Plate",
well_col="Well",
gene_col="gene_symbol",
norm_method="genes and all Non-targeting",
gene_hit_method="moderated_t",
p_cutoff=0.05,
lfc_cutoff=1.0,
)
# Run the full pipeline
wf.run_all()
# Access results
print(wf.results_summary())
# Gene-level hits
gene_hits = wf.get_step("hit_calling")
Pooled Screen
from targetsage.pipeline.pooled_intensity_workflow import PooledIntensityWorkflow
wf = PooledIntensityWorkflow(
df,
screen_name="my_pooled_screen",
reference_cols=["Baseline_R1", "Baseline_R2", "Baseline_R3"],
treatment_cols=["Treatment_R1", "Treatment_R2", "Treatment_R3"],
gene_col="gene_symbol",
p_cutoff=0.05,
log2fc_cutoff=0.3,
)
wf.run_all()
🖥️ Command Line Interface
Three entry points are installed:
# Main dispatcher
targetsage array <command> [options]
targetsage pool <command> [options]
# Direct aliases
targetsage-array <command> [options]
targetsage-pool <command> [options]
Arrayed CLI Example
targetsage array run-all \
-i data/my_screen.csv \
-o results/arrayed_out \
--rep1-col Raw_rep1 \
--rep2-col Raw_rep2 \
--plate-col Plate \
--gene-col gene_symbol \
--norm-method "genes and all Non-targeting" \
--gene-hit-method moderated_t
Pooled CLI Example
targetsage pool run-all \
-i data/my_pooled_screen.csv \
-o results/pooled_out \
--reference-cols Baseline_R1 Baseline_R2 Baseline_R3 \
--treatment-cols Treatment_R1 Treatment_R2 Treatment_R3 \
--gene-col gene_symbol \
--pvalue-method welch
Get help for any command:
targetsage array --help
targetsage array run-all --help
📊 Analysis Methods
Normalization
| Method | Description | Best For |
|---|---|---|
genes and all Non-targeting |
Gene wells + all NTC wells as baseline | Standard screens |
genes and own negative controls |
Per-gene own NTC scaling | Screens with matched controls |
all negative controls |
NTC-only baseline | Small screens |
B-score |
Median polish (row/column correction) | Low hit-rate plates (<20%) |
LOESS |
Local regression spatial correction | High hit-rate plates |
Z-score plate |
Per-plate Z-score | Quick normalization |
Hit Calling (Arrayed)
| Method | Description |
|---|---|
moderated_t |
Moderated t-test with shrinkage (limma-style) |
t_test |
Standard Welch's t-test vs NTC |
mann_whitney |
Non-parametric Mann-Whitney U |
rank_product |
Rank product for replicate concordance |
rsa |
Redundant siRNA Analysis (Konig-style) |
lme |
Linear Mixed Effects — random plate intercepts for multi-plate designs |
Hit Calling (Pooled)
| Method | Description |
|---|---|
RRA |
Robust Rank Aggregation (MAGeCK-style) |
NB GLM |
Negative Binomial generalized linear model |
QC Metrics
- SSMD — Strictly Standardized Mean Difference (well-level effect size)
- Z′-factor — Plate quality score
- Replicate correlation — Pearson r between replicates (configurable threshold)
- Hit-rate estimation — Per-plate hit-rate for B-score guardrails
Pathway Enrichment
- Over-Representation Analysis (ORA) via gseapy/Enrichr
- Custom GMT gene-set support
- Cached offline mode for reproducibility
📓 Notebooks & Examples
Example notebooks are available in the notebooks/ directory:
| Notebook | Description |
|---|---|
targetsage_package_example.ipynb |
Quick-start package API walkthrough |
arrayed_screen_walkthrough.ipynb |
Step-by-step arrayed screen analysis |
arrayed_screen_walkthrough_executed.ipynb |
Same with executed outputs |
tss_crispri_384_pipeline.ipynb |
Real 384-well CRISPRi example |
rnaither_drosophila_walkthrough.ipynb |
RNAi screen example (Drosophila) |
targetsage_crispra_array_step_by_step.ipynb |
CRISPRa activation screen |
arrayed_screen_data_simulator.ipynb |
Generate synthetic test data |
Run locally:
git clone https://github.com/your-org/targetsage.git
cd targetsage
pip install -e ".[notebooks]"
jupyter notebook notebooks/
🧪 Testing
# Clone the repository
git clone https://github.com/your-org/targetsage.git
cd targetsage
# Install in development mode
pip install -e ".[dev]"
# Run the test suite
pytest
🏗️ Architecture
targetsage/
├── data/ # Data loaders, schema, validation
├── normalization/ # Normalization methods (B-score, LOESS, Z-score, etc.)
├── hits/ # Hit scoring (well-level + gene-level aggregation)
├── stats/ # Statistical tests, QC, batch correction
├── pipeline/ # Workflow runners (ArrayedScreenWorkflow, PooledIntensityWorkflow)
│ ├── steps/ # Individual step implementations
│ ├── config.py # Configuration dataclasses
│ ├── arrayed_workflow.py
│ └── pooled_intensity_workflow.py
├── qc/ # QC engine and report generation
├── utils/ # DataFrame helpers, well coordinates, etc.
├── visualization/ # Plotting utilities
├── network/ # Network analysis and visualization
└── cli.py # Command-line entry points
🔬 Citation
If you use TargetSage in your research, please cite:
TargetSage: A computational pipeline for CRISPR/RNAi screening analysis. Package version X.Y.Z, https://pypi.org/project/targetsage/
Key methods implemented in TargetSage are based on established literature:
- B-score normalization: Brideau et al., J Biomol Screen 2003
- LOESS normalization: Cleveland et al., J Am Stat Assoc 1979
- LME for arrayed CRISPR: PLOS ONE 2024 (simulation-guided method selection)
- SSMD: Zhang et al., J Biomol Screen 2007
- RRA: Kolde et al., Bioinformatics 2012
📄 License
MIT License — see LICENSE file.
🤝 Contributing
Contributions are welcome! Please open an issue or pull request on GitHub.
For the full-stack web application (TargetSage Cloud), see the separate documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file targetsage-0.3.0.tar.gz.
File metadata
- Download URL: targetsage-0.3.0.tar.gz
- Upload date:
- Size: 360.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
675e5ff9e6ffb223da21ea8aede3be49619d80080c3e7cdce70a7cc99407324a
|
|
| MD5 |
a94ee91231a147a3d089c9e08b9af0ef
|
|
| BLAKE2b-256 |
4c04cae1ef6f83402ce4e3c7ef5e5d304cac52d9cedc3cc8fc047605272a004b
|
File details
Details for the file targetsage-0.3.0-py3-none-any.whl.
File metadata
- Download URL: targetsage-0.3.0-py3-none-any.whl
- Upload date:
- Size: 402.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
faf56e0f9ec415ac98b4222dbb324279e896ce76abd3bec716c5b586a709a890
|
|
| MD5 |
fe14af39478b61ee76519e13069d6bd3
|
|
| BLAKE2b-256 |
00e3ec323f09361404945921ddc3c03189f25512b9efa4a92e926c611f1b4b3c
|