Skip to main content

CRISPR gRNA design pipeline for SpCas9 (NGG PAM)

Project description

CRISPRDesigner2

A CRISPR gRNA design tool for SpCas9 (NGG PAM) targeting hg38 and mm10 genomes. Available as both a Python package for programmatic use and a Snakemake workflow for batch processing.

Overview

This tool designs CRISPR guide RNAs (gRNAs) for SpCas9 targeting NGG PAM sites:

  1. Extracts candidate gRNAs from input genomic regions
  2. Filters guides based on sequence quality criteria
  3. Scores on-target efficiency using the RS3 model
  4. Scores off-target specificity using GuideScan2
  5. Outputs scored guides in standard formats

Installation

Option 1: Python package (recommended for integration)

Install as an editable package for use in other projects:

# Clone the repository
git clone https://github.com/EngreitzLab/CRISPRDesigner2.git
cd CRISPRDesigner2

# Create conda environment
conda env create -f workflow/envs/CRISPRDesigner2.yaml -n CRISPRDesigner2
conda activate CRISPRDesigner2

# Install as Python package
pip install -e .

Option 2: Snakemake workflow only

If you only need the Snakemake workflow:

conda env create -f workflow/envs/CRISPRDesigner2.yaml -n CRISPRDesigner2
conda activate CRISPRDesigner2

Data downloads

Genome FASTA

# hg38
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz

# mm10
wget https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz
gunzip mm10.fa.gz

GuideScan2 index

Download pre-built indices from guidescan.com/downloads, then create a BAM index:

module load biology samtools/1.16.1
samtools index path/to/guidescan2.bam.sorted

RS3 model weights

No download required. The rs3 package bundles model weights and is installed automatically.

Usage

As a Python package

from crisprdesigner2 import (
    extract_guides_from_bed,
    apply_filters,
    get_passing_guides,
    score_guides_rs3,
    score_guides_guidescan,
    write_outputs,
)

# Extract guides from regions
guides = extract_guides_from_bed("regions.bed", "genome.fa")
import pandas as pd
guides_df = pd.DataFrame(guides)

# Filter by sequence quality
guides_df = apply_filters(guides_df)
passed = get_passing_guides(guides_df)

# Score on-target efficiency
scored = score_guides_rs3(passed, tracr="Chen2013")

# Score off-target specificity (optional)
scored = score_guides_guidescan(scored, "guidescan2.bam.sorted", "hg38")

# Write outputs
write_outputs(scored, "output_dir")

Public API

from crisprdesigner2 import (
    # Extraction
    extract_guides,              # Extract from single region
    extract_guides_from_bed,     # Extract from BED file
    reverse_complement,
    CONTEXT_PADDING,

    # Filtering
    apply_filters,               # Apply all sequence filters
    get_passing_guides,          # Get guides passing all filters
    is_valid_guide,              # Check single guide
    FILTERS,                     # Dict of filter functions

    # Scoring
    score_guides_rs3,            # RS3 on-target scoring (batch)
    score_single_guide_rs3,      # RS3 scoring (single guide)
    score_guides_guidescan,      # GuideScan2 off-target scoring

    # Output
    write_outputs,               # Write .txt and .bed files
    read_design_guides_txt,      # Read existing output

    # Caching
    load_predesigned_guides,     # Load cached guides
    merge_with_predesigned,      # Merge new + cached guides
)

As a Snakemake workflow

Configure config/config.yaml:

genome: "hg38"
regions: "path/to/regions.bed"
genome_fasta: "path/to/genome.fa"
guidescan2_index: "path/to/guidescan2.bam.sorted"
output_dir: "results/GuideDesign"
predesigned_guides: ""  # Optional: path to cached guides

Run the workflow:

# Dry run (validation)
snakemake -n --configfile config/config.yaml

# Full run
snakemake --configfile config/config.yaml --cores 4

Output files

designGuides.txt

Tab-separated file with scored guides:

Column Description
chr Chromosome
start Start position (0-based)
end End position (exclusive)
locus Formatted locus string
strand Strand (+ or -)
GuideSequenceWithPAM 23-nt sequence (protospacer + PAM)
guideSet Region name from input BED
RS3_score On-target efficiency (log-odds, typically -2 to +2)
specificity_score Off-target specificity (0-1, higher = better)

designGuides.bed

BED6 format for genome browser visualization.

Interpreting scores

RS3 on-target scores

RS3 scores are log-odds values, typically ranging from -2 to +2. Higher scores indicate better predicted on-target efficiency.

GuideScan2 specificity scores

Specificity scores range from 0-1 (higher = more specific / fewer off-targets). The GuideScan2 paper recommends specificity score > 0.2 as a cutoff (Perez et al., Genome Biology 2025).

Guides with NaN specificity either:

  • Are not in the reference genome
  • Have multiple perfect matches in the genome

Sequence filters

Filter Criterion Reason
Pol III termination >1 T in last 4 nt Early transcription termination
High T/U content >40% T Reduced guide stability
T/U homopolymer 4+ consecutive T Pol III termination
Mononucleotide repeat 5+ consecutive same base Synthesis/sequencing issues
Low complexity Various repeat patterns Off-target concerns
Low GC ≤20% GC Unstable RNA structure
High GC ≥90% GC Synthesis issues

Project structure

CRISPRDesigner2/
├── pyproject.toml                      # Package metadata
├── src/
│   └── crisprdesigner2/                # Python package
│       ├── __init__.py                 # Public API
│       ├── extraction.py               # Guide extraction
│       ├── filters.py                  # Sequence filters
│       ├── scoring.py                  # Combined scoring exports
│       ├── _rs3_scoring.py             # RS3 implementation
│       ├── _guidescan_scoring.py       # GuideScan2 implementation
│       ├── output.py                   # Output file generation
│       └── cache.py                    # Pre-designed guide caching
├── Snakefile                           # Workflow definition
├── config/
│   └── config.yaml                     # Configuration template
├── workflow/
│   ├── envs/
│   │   └── CRISPRDesigner2.yaml        # Conda environment
│   └── scripts/
│       ├── snakemake_*.py              # Snakemake rule scripts
│       └── *.py                        # Thin wrappers (backward compat)
├── tests/
│   ├── data/                           # Test fixtures
│   └── test_*.py                       # Test modules
└── README.md

Testing

conda activate CRISPRDesigner2

# Run all tests
python -m pytest tests/ -v

# Run specific module
python -m pytest tests/test_guide_extraction.py -v

Troubleshooting

Import errors

Ensure the package is installed:

conda activate CRISPRDesigner2
pip install -e .

Missing GuideScan2

If GuideScan2 is unavailable, specificity scores will be NaN. The workflow still completes.

Slow GuideScan2 queries

Ensure the BAM index exists:

ls path/to/guidescan2.bam.sorted.bai
# If missing:
samtools index path/to/guidescan2.bam.sorted

License

See LICENSE file.

Citation

If you use this tool, please cite:

  • RS3: Doench et al., Nature Biotechnology, 2016
  • GuideScan2: Perez et al., Genome Biology, 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crisprdesigner2-0.1.0.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crisprdesigner2-0.1.0-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file crisprdesigner2-0.1.0.tar.gz.

File metadata

  • Download URL: crisprdesigner2-0.1.0.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for crisprdesigner2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b98ec463410b74a1130d84e6e91aef8eae8c32c918c46f1cbd7fb5d0172ce988
MD5 cd4aa3db7a8cad71b3cf2e65d2d430aa
BLAKE2b-256 9746ae56b7679a409448d3afc8101380ff8f84777b712d6d5e2432fe17d85207

See more details on using hashes here.

File details

Details for the file crisprdesigner2-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crisprdesigner2-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8f9109f7e3e017fe3e7b99604262817a8994e19e859c19a4f55437aa40253c66
MD5 6a7701c0b93ab7bfc4a06cf9d45bff11
BLAKE2b-256 783b29af78b5236d1c1bcb9d8941e3254116b9895ca30f04af5a2b2e681a6eb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page