TRACE: Triple-aligner Read Analysis for CRISPR Editing
Project description
TRACE
Triple-aligner Read Analysis for CRISPR Editing
Features
- Triple-aligner consensus: Uses BWA-MEM, BBMap, and minimap2 for robust alignment
- Flexible input: Accepts DNA sequences directly or FASTA file paths
- Automatic inference: Detects PAM, cleavage site, homology arms, and edits from sequences
- Large edit support: Handles insertions up to 50+ bp with automatic k-mer size adjustment
- K-mer classification: Fast pre-alignment HDR/WT detection (auto-sizes k-mers based on edit)
- Multi-nuclease support: Cas9 and Cas12a (Cpf1) with correct cleavage geometry
- Auto-detection: Library type (TruSeq/Tn5), read merging need, CRISPResso mode
- CRISPResso2 integration: Validation with standard CRISPR analysis tool
Installation
pip (Python package only)
pip install trace-crispr
conda (includes external aligners)
conda install -c bioconda -c conda-forge trace-crispr
Development installation
git clone https://github.com/k-roy/trace.git
cd trace
pip install -e ".[dev]"
Quick Start
TRACE accepts sequences as either DNA strings or FASTA file paths.
Example 1: Using FASTA files
trace run \
--reference amplicon.fasta \
--hdr-template hdr_template.fasta \
--guide GCTGAAGCACTGCACGCCGT \
--r1 sample_R1.fastq.gz \
--r2 sample_R2.fastq.gz \
--output results/
Example 2: Using DNA sequences directly
The HDR template (150 bp) is typically shorter than the reference amplicon (250 bp), with ~50 bp flanking each side in the reference:
# Reference amplicon (250 bp) - includes flanking regions
REF="ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
GCTGAAGCACTGCACGCCGTNGG\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"
# HDR template (150 bp) - centered on edit site
HDR="ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
GCTGAAGCACTGCACGCCGTNGA\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"
trace run \
-r "$REF" \
-h "$HDR" \
-g GCTGAAGCACTGCACGCCGT \
--r1 sample_R1.fastq.gz \
--output results/
Check locus configuration without running
trace info \
--reference amplicon.fasta \
--hdr-template hdr_template.fasta \
--guide GCTGAAGCACTGCACGCCGT
This will print:
=== TRACE Analysis Configuration ===
Reference sequence: 500 bp
HDR template: 500 bp
Donor template analysis:
- Left homology arm: positions 1-245 on reference (245 bp)
- Right homology arm: positions 255-500 on reference (245 bp)
- Donor edits detected at positions: 246, 247 on reference
* Position 246: C → G (PAM-silencing mutation)
* Position 247: C → T (chromophore Y66H mutation)
Guide analysis:
- Guide sequence: GCTGAAGCACTGCACGCCGT
- Guide targets: positions 248-267 on reference (- strand)
- PAM: GGG at positions 245-247 on reference
- Cleavage site: position 248 on reference
Multiple samples
Create a sample key TSV:
sample_id r1_path r2_path condition
sample_1 /path/to/S1_R1.fastq.gz /path/to/S1_R2.fastq.gz treatment
sample_2 /path/to/S2_R1.fastq.gz /path/to/S2_R2.fastq.gz control
Then run:
trace run \
--reference amplicon.fasta \
--hdr-template hdr_template.fasta \
--guide GCTGAAGCACTGCACGCCGT \
--sample-key samples.tsv \
--output results/ \
--threads 16
Using Cas12a
trace run \
--reference amplicon.fasta \
--hdr-template hdr_template.fasta \
--guide GCTGAAGCACTGCACGCCGTAA \
--nuclease cas12a \
--sample-key samples.tsv \
--output results/
Edit Detection
TRACE automatically detects edits by aligning the HDR template to the reference:
- Substitutions: Single nucleotide changes (e.g., C → G)
- Insertions: Extra bases in the HDR template (up to 50+ bp)
- Deletions: Missing bases in the HDR template
For large edits, TRACE automatically increases the k-mer size to ensure reliable classification. The k-mer size is always at least 10 bp larger than the largest edit.
Example output for a 20 bp insertion:
Edits detected (1 total):
* Position 125: +ATCGATCGATCGATCGATCG (20 bp insertion)
Maximum edit size: 20 bp
Recommended k-mer size: 30 bp
Nuclease Support
Cas9 (SpCas9)
- PAM: NGG (3' of protospacer)
- Cleavage: 3 bp upstream of PAM (blunt ends)
Cas12a (LbCpf1)
- PAM: TTTN (5' of protospacer)
- Cleavage: 18-19 bp downstream on target strand, 23 bp on non-target
- Creates 4-5 nt 5' overhang (staggered cut)
Output
The main output is a TSV file with per-sample editing outcomes:
| Column | Description |
|---|---|
| sample | Sample ID |
| classifiable_reads | Total classifiable reads |
| duplicate_rate | PCR duplicate rate (Tn5) |
| Dedup_WT_% | Wild-type % (deduplicated) |
| Dedup_HDR_% | HDR % (deduplicated) |
| Dedup_NHEJ_% | NHEJ % (deduplicated) |
| Dedup_LgDel_% | Large deletion % |
| kmer_hdr_rate | K-mer method HDR rate |
| crispresso_hdr_rate | CRISPResso2 HDR rate |
Dependencies
Python
- click>=8.0
- pysam>=0.20
- pandas>=1.5
- numpy>=1.20
- pyyaml>=6.0
- rapidfuzz>=3.0
- tqdm>=4.60
External tools (via conda)
- bwa>=0.7
- bbmap>=39
- minimap2>=2.24
- samtools>=1.16
- crispresso2 (optional, but enabled by default)
Author
Kevin R. Roy
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trace_crispr-0.2.0.tar.gz.
File metadata
- Download URL: trace_crispr-0.2.0.tar.gz
- Upload date:
- Size: 46.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9c7b76f8c48960867c026e047f82ef69b6015b6dba0d17ff03a5e78d1efbac1
|
|
| MD5 |
e9007d3f148d356d9c572b4ec0eabce1
|
|
| BLAKE2b-256 |
e28c8c721f86d0e5aff6b2da3f468151cc7023a46d3819a7b807c1471a9c9b31
|
File details
Details for the file trace_crispr-0.2.0-py3-none-any.whl.
File metadata
- Download URL: trace_crispr-0.2.0-py3-none-any.whl
- Upload date:
- Size: 54.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcefbc8e888942293e11f6102a4795ef7f7b9ded13cd8e88d5632cef2b83e8b3
|
|
| MD5 |
d5dfb924cd81d95b3e7ba27d7dc60dfa
|
|
| BLAKE2b-256 |
4d87c23b761963567a9d6344d6a1e6805a40240b659bab1a18c01337b63f5153
|