CRISPR gRNA design pipeline for SpCas9 (NGG PAM)
Project description
CRISPRDesigner2
A CRISPR gRNA design tool for SpCas9 (NGG PAM) targeting hg38 and mm10 genomes. Available as both a Python package for programmatic use and a Snakemake workflow for batch processing.
Overview
This tool designs CRISPR guide RNAs (gRNAs) for SpCas9 targeting NGG PAM sites:
- Extracts candidate gRNAs from input genomic regions
- Filters guides based on sequence quality criteria
- Scores on-target efficiency using the RS3 model
- Scores off-target specificity using GuideScan2
- Outputs scored guides in standard formats
Installation
Option 1: Python package (recommended for integration)
Install as an editable package for use in other projects:
# Clone the repository
git clone https://github.com/EngreitzLab/CRISPRDesigner2.git
cd CRISPRDesigner2
# Create conda environment
conda env create -f workflow/envs/CRISPRDesigner2.yaml -n CRISPRDesigner2
conda activate CRISPRDesigner2
# Install as Python package
pip install -e .
Option 2: Snakemake workflow only
If you only need the Snakemake workflow:
conda env create -f workflow/envs/CRISPRDesigner2.yaml -n CRISPRDesigner2
conda activate CRISPRDesigner2
Data downloads
Genome FASTA
# hg38
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
# mm10
wget https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz
gunzip mm10.fa.gz
GuideScan2 index
Download pre-built indices from guidescan.com/downloads, then create a BAM index:
module load biology samtools/1.16.1
samtools index path/to/guidescan2.bam.sorted
RS3 model weights
No download required. The rs3 package bundles model weights and is installed automatically.
Usage
As a Python package
from crisprdesigner2 import (
extract_guides_from_bed,
apply_filters,
get_passing_guides,
score_guides_rs3,
score_guides_guidescan,
write_outputs,
)
# Extract guides from regions
guides = extract_guides_from_bed("regions.bed", "genome.fa")
import pandas as pd
guides_df = pd.DataFrame(guides)
# Filter by sequence quality
guides_df = apply_filters(guides_df)
passed = get_passing_guides(guides_df)
# Score on-target efficiency
scored = score_guides_rs3(passed, tracr="Chen2013")
# Score off-target specificity (optional)
scored = score_guides_guidescan(scored, "guidescan2.bam.sorted", "hg38")
# Write outputs
write_outputs(scored, "output_dir")
Public API
from crisprdesigner2 import (
# Extraction
extract_guides, # Extract from single region
extract_guides_from_bed, # Extract from BED file
reverse_complement,
CONTEXT_PADDING,
# Filtering
apply_filters, # Apply all sequence filters
get_passing_guides, # Get guides passing all filters
is_valid_guide, # Check single guide
FILTERS, # Dict of filter functions
# Scoring
score_guides_rs3, # RS3 on-target scoring (batch)
score_single_guide_rs3, # RS3 scoring (single guide)
score_guides_guidescan, # GuideScan2 off-target scoring
# Output
write_outputs, # Write .txt and .bed files
read_design_guides_txt, # Read existing output
# Caching
load_predesigned_guides, # Load cached guides
merge_with_predesigned, # Merge new + cached guides
)
As a Snakemake workflow
Configure config/config.yaml:
genome: "hg38"
regions: "path/to/regions.bed"
genome_fasta: "path/to/genome.fa"
guidescan2_index: "path/to/guidescan2.bam.sorted"
output_dir: "results/GuideDesign"
predesigned_guides: "" # Optional: path to cached guides
Run the workflow:
# Dry run (validation)
snakemake -n --configfile config/config.yaml
# Full run
snakemake --configfile config/config.yaml --cores 4
Output files
designGuides.txt
Tab-separated file with scored guides:
| Column | Description |
|---|---|
chr |
Chromosome |
start |
Start position (0-based) |
end |
End position (exclusive) |
locus |
Formatted locus string |
strand |
Strand (+ or -) |
GuideSequenceWithPAM |
23-nt sequence (protospacer + PAM) |
guideSet |
Region name from input BED |
RS3_score |
On-target efficiency (log-odds, typically -2 to +2) |
specificity_score |
Off-target specificity (0-1, higher = better) |
designGuides.bed
BED6 format for genome browser visualization.
Interpreting scores
RS3 on-target scores
RS3 scores are log-odds values, typically ranging from -2 to +2. Higher scores indicate better predicted on-target efficiency.
GuideScan2 specificity scores
Specificity scores range from 0-1 (higher = more specific / fewer off-targets). The GuideScan2 paper recommends specificity score > 0.2 as a cutoff (Perez et al., Genome Biology 2025).
Guides with NaN specificity either:
- Are not in the reference genome
- Have multiple perfect matches in the genome
Sequence filters
| Filter | Criterion | Reason |
|---|---|---|
| Pol III termination | >1 T in last 4 nt | Early transcription termination |
| High T/U content | >40% T | Reduced guide stability |
| T/U homopolymer | 4+ consecutive T | Pol III termination |
| Mononucleotide repeat | 5+ consecutive same base | Synthesis/sequencing issues |
| Low complexity | Various repeat patterns | Off-target concerns |
| Low GC | ≤20% GC | Unstable RNA structure |
| High GC | ≥90% GC | Synthesis issues |
Project structure
CRISPRDesigner2/
├── pyproject.toml # Package metadata
├── src/
│ └── crisprdesigner2/ # Python package
│ ├── __init__.py # Public API
│ ├── extraction.py # Guide extraction
│ ├── filters.py # Sequence filters
│ ├── scoring.py # Combined scoring exports
│ ├── _rs3_scoring.py # RS3 implementation
│ ├── _guidescan_scoring.py # GuideScan2 implementation
│ ├── output.py # Output file generation
│ └── cache.py # Pre-designed guide caching
├── Snakefile # Workflow definition
├── config/
│ └── config.yaml # Configuration template
├── workflow/
│ ├── envs/
│ │ └── CRISPRDesigner2.yaml # Conda environment
│ └── scripts/
│ ├── snakemake_*.py # Snakemake rule scripts
│ └── *.py # Thin wrappers (backward compat)
├── tests/
│ ├── data/ # Test fixtures
│ └── test_*.py # Test modules
└── README.md
Testing
conda activate CRISPRDesigner2
# Run all tests
python -m pytest tests/ -v
# Run specific module
python -m pytest tests/test_guide_extraction.py -v
Troubleshooting
Import errors
Ensure the package is installed:
conda activate CRISPRDesigner2
pip install -e .
Missing GuideScan2
If GuideScan2 is unavailable, specificity scores will be NaN. The workflow still completes.
Slow GuideScan2 queries
Ensure the BAM index exists:
ls path/to/guidescan2.bam.sorted.bai
# If missing:
samtools index path/to/guidescan2.bam.sorted
License
See LICENSE file.
Citation
If you use this tool, please cite:
- RS3: Doench et al., Nature Biotechnology, 2016
- GuideScan2: Perez et al., Genome Biology, 2025
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crisprdesigner2-0.1.0.tar.gz.
File metadata
- Download URL: crisprdesigner2-0.1.0.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b98ec463410b74a1130d84e6e91aef8eae8c32c918c46f1cbd7fb5d0172ce988
|
|
| MD5 |
cd4aa3db7a8cad71b3cf2e65d2d430aa
|
|
| BLAKE2b-256 |
9746ae56b7679a409448d3afc8101380ff8f84777b712d6d5e2432fe17d85207
|
File details
Details for the file crisprdesigner2-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crisprdesigner2-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f9109f7e3e017fe3e7b99604262817a8994e19e859c19a4f55437aa40253c66
|
|
| MD5 |
6a7701c0b93ab7bfc4a06cf9d45bff11
|
|
| BLAKE2b-256 |
783b29af78b5236d1c1bcb9d8941e3254116b9895ca30f04af5a2b2e681a6eb9
|