Skip to main content

TargetSage: computational pipeline for CRISPR/RNAi screening analysis

Project description

TargetSage

Computational pipeline for CRISPR/RNAi screening analysis — arrayed and pooled.

PyPI version Python 3.11+ License: MIT

TargetSage is a Python library and command-line toolkit for analyzing high-throughput CRISPR and RNAi screening data. It supports both:

  • Arrayed screens — well-level readouts from plate readers (384/96-well format)
  • Pooled screens — guide-level readouts from sequencing (dropout/enrichment)

The pipeline covers the full analysis workflow: data validation → normalization → QC → hit calling → gene aggregation → pathway enrichment → visualization.


📦 Installation

pip install targetsage

Requirements: Python ≥3.11

Optional dependencies for full functionality:

pip install targetsage[notebooks]   # Jupyter support
pip install targetsage[plots]       # Enhanced plotting backends

🚀 Quick Start

Arrayed Screen — One Function

import pandas as pd
from targetsage.stats.batch_correction import run_arrayed_analysis

df = pd.read_csv("my_arrayed_screen.csv")

results = run_arrayed_analysis(
    df,
    rep1_col="Raw_rep1",
    rep2_col="Raw_rep2",
    plate_col="Plate",
    control_type_col="SpotType",
    norm_method="genes and all Non-targeting",
    p_cutoff=0.05,
    lfc_cutoff=1.0,
)

# results is a DataFrame with well-level hits, fold-changes, SSMD, p-values
hits = results[results["is_hit"] == True]
print(f"Found {len(hits)} hits")

Arrayed Screen — Step-by-Step Workflow

from targetsage.pipeline.arrayed_workflow import ArrayedScreenWorkflow

wf = ArrayedScreenWorkflow(
    df,
    screen_name="my_screen",
    rep_cols=["Raw_rep1", "Raw_rep2"],
    plate_col="Plate",
    well_col="Well",
    gene_col="gene_symbol",
    norm_method="genes and all Non-targeting",
    gene_hit_method="moderated_t",
    p_cutoff=0.05,
    lfc_cutoff=1.0,
)

# Run the full pipeline
wf.run_all()

# Access results
print(wf.results_summary())

# Gene-level hits
gene_hits = wf.get_step("hit_calling")

Pooled Screen

from targetsage.pipeline.pooled_intensity_workflow import PooledIntensityWorkflow

wf = PooledIntensityWorkflow(
    df,
    screen_name="my_pooled_screen",
    reference_cols=["Baseline_R1", "Baseline_R2", "Baseline_R3"],
    treatment_cols=["Treatment_R1", "Treatment_R2", "Treatment_R3"],
    gene_col="gene_symbol",
    p_cutoff=0.05,
    log2fc_cutoff=0.3,
)

wf.run_all()

🖥️ Command Line Interface

Three entry points are installed:

# Main dispatcher
targetsage array <command> [options]
targetsage pool  <command> [options]

# Direct aliases
targetsage-array <command> [options]
targetsage-pool  <command> [options]

Arrayed CLI Example

targetsage array run-all \
    -i data/my_screen.csv \
    -o results/arrayed_out \
    --rep1-col Raw_rep1 \
    --rep2-col Raw_rep2 \
    --plate-col Plate \
    --gene-col gene_symbol \
    --norm-method "genes and all Non-targeting" \
    --gene-hit-method moderated_t

Pooled CLI Example

targetsage pool run-all \
    -i data/my_pooled_screen.csv \
    -o results/pooled_out \
    --reference-cols Baseline_R1 Baseline_R2 Baseline_R3 \
    --treatment-cols Treatment_R1 Treatment_R2 Treatment_R3 \
    --gene-col gene_symbol \
    --pvalue-method welch

Get help for any command:

targetsage array --help
targetsage array run-all --help

📊 Analysis Methods

Normalization

Method Description Best For
genes and all Non-targeting Gene wells + all NTC wells as baseline Standard screens
genes and own negative controls Per-gene own NTC scaling Screens with matched controls
all negative controls NTC-only baseline Small screens
B-score Median polish (row/column correction) Low hit-rate plates (<20%)
LOESS Local regression spatial correction High hit-rate plates
Z-score plate Per-plate Z-score Quick normalization

Hit Calling (Arrayed)

Method Description
moderated_t Moderated t-test with shrinkage (limma-style)
t_test Standard Welch's t-test vs NTC
mann_whitney Non-parametric Mann-Whitney U
rank_product Rank product for replicate concordance
rsa Redundant siRNA Analysis (Konig-style)
lme Linear Mixed Effects — random plate intercepts for multi-plate designs

Hit Calling (Pooled)

Method Description
RRA Robust Rank Aggregation (MAGeCK-style)
NB GLM Negative Binomial generalized linear model

QC Metrics

  • SSMD — Strictly Standardized Mean Difference (well-level effect size)
  • Z′-factor — Plate quality score
  • Replicate correlation — Pearson r between replicates (configurable threshold)
  • Hit-rate estimation — Per-plate hit-rate for B-score guardrails

Pathway Enrichment

  • Over-Representation Analysis (ORA) via gseapy/Enrichr
  • Custom GMT gene-set support
  • Cached offline mode for reproducibility

📓 Notebooks & Examples

Example notebooks are available in the notebooks/ directory:

Notebook Description
targetsage_package_example.ipynb Quick-start package API walkthrough
arrayed_screen_walkthrough.ipynb Step-by-step arrayed screen analysis
arrayed_screen_walkthrough_executed.ipynb Same with executed outputs
tss_crispri_384_pipeline.ipynb Real 384-well CRISPRi example
rnaither_drosophila_walkthrough.ipynb RNAi screen example (Drosophila)
targetsage_crispra_array_step_by_step.ipynb CRISPRa activation screen
arrayed_screen_data_simulator.ipynb Generate synthetic test data

Run locally:

git clone https://github.com/your-org/targetsage.git
cd targetsage
pip install -e ".[notebooks]"
jupyter notebook notebooks/

🧪 Testing

# Clone the repository
git clone https://github.com/your-org/targetsage.git
cd targetsage

# Install in development mode
pip install -e ".[dev]"

# Run the test suite
pytest

🏗️ Architecture

targetsage/
├── data/              # Data loaders, schema, validation
├── normalization/     # Normalization methods (B-score, LOESS, Z-score, etc.)
├── hits/              # Hit scoring (well-level + gene-level aggregation)
├── stats/             # Statistical tests, QC, batch correction
├── pipeline/          # Workflow runners (ArrayedScreenWorkflow, PooledIntensityWorkflow)
│   ├── steps/         # Individual step implementations
│   ├── config.py      # Configuration dataclasses
│   ├── arrayed_workflow.py
│   └── pooled_intensity_workflow.py
├── qc/                # QC engine and report generation
├── utils/             # DataFrame helpers, well coordinates, etc.
├── visualization/     # Plotting utilities
├── network/           # Network analysis and visualization
└── cli.py             # Command-line entry points

🔬 Citation

If you use TargetSage in your research, please cite:

TargetSage: A computational pipeline for CRISPR/RNAi screening analysis. Package version X.Y.Z, https://pypi.org/project/targetsage/

Key methods implemented in TargetSage are based on established literature:

  • B-score normalization: Brideau et al., J Biomol Screen 2003
  • LOESS normalization: Cleveland et al., J Am Stat Assoc 1979
  • LME for arrayed CRISPR: PLOS ONE 2024 (simulation-guided method selection)
  • SSMD: Zhang et al., J Biomol Screen 2007
  • RRA: Kolde et al., Bioinformatics 2012

📄 License

MIT License — see LICENSE file.


🤝 Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

For the full-stack web application (TargetSage Cloud), see the separate documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

targetsage-0.3.0.tar.gz (360.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

targetsage-0.3.0-py3-none-any.whl (402.5 kB view details)

Uploaded Python 3

File details

Details for the file targetsage-0.3.0.tar.gz.

File metadata

  • Download URL: targetsage-0.3.0.tar.gz
  • Upload date:
  • Size: 360.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for targetsage-0.3.0.tar.gz
Algorithm Hash digest
SHA256 675e5ff9e6ffb223da21ea8aede3be49619d80080c3e7cdce70a7cc99407324a
MD5 a94ee91231a147a3d089c9e08b9af0ef
BLAKE2b-256 4c04cae1ef6f83402ce4e3c7ef5e5d304cac52d9cedc3cc8fc047605272a004b

See more details on using hashes here.

File details

Details for the file targetsage-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: targetsage-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 402.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for targetsage-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 faf56e0f9ec415ac98b4222dbb324279e896ce76abd3bec716c5b586a709a890
MD5 fe14af39478b61ee76519e13069d6bd3
BLAKE2b-256 00e3ec323f09361404945921ddc3c03189f25512b9efa4a92e926c611f1b4b3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page