Skip to main content

TRACE: Triple-aligner Read Analysis for CRISPR Editing

Project description

TRACE

Triple-aligner Read Analysis for CRISPR Editing

Features

  • Triple-aligner consensus: Uses BWA-MEM, BBMap, and minimap2 for robust alignment
  • Flexible input: Accepts DNA sequences directly or FASTA file paths
  • Automatic inference: Detects PAM, cleavage site, homology arms, and edits from sequences
  • Large edit support: Handles insertions up to 50+ bp with automatic k-mer size adjustment
  • K-mer classification: Fast pre-alignment HDR/WT detection (auto-sizes k-mers based on edit)
  • Multi-nuclease support: Cas9 and Cas12a (Cpf1) with correct cleavage geometry
  • Auto-detection: Library type (TruSeq/Tn5), read merging need, CRISPResso mode
  • CRISPResso2 integration: Validation with standard CRISPR analysis tool

Installation

pip (Python package only)

pip install trace-crispr

conda (includes external aligners)

conda install -c bioconda -c conda-forge trace-crispr

Development installation

git clone https://github.com/k-roy/trace.git
cd trace
pip install -e ".[dev]"

Quick Start

TRACE accepts sequences as either DNA strings or FASTA file paths.

Example 1: Using FASTA files

trace run \
  --reference amplicon.fasta \
  --hdr-template hdr_template.fasta \
  --guide GCTGAAGCACTGCACGCCGT \
  --r1 sample_R1.fastq.gz \
  --r2 sample_R2.fastq.gz \
  --output results/

Example 2: Using DNA sequences directly

The HDR template (150 bp) is typically shorter than the reference amplicon (250 bp), with ~50 bp flanking each side in the reference:

# Reference amplicon (250 bp) - includes flanking regions
REF="ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
GCTGAAGCACTGCACGCCGTNGG\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"

# HDR template (150 bp) - centered on edit site
HDR="ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG\
GCTGAAGCACTGCACGCCGTNGA\
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG"

trace run \
  -r "$REF" \
  -h "$HDR" \
  -g GCTGAAGCACTGCACGCCGT \
  --r1 sample_R1.fastq.gz \
  --output results/

Check locus configuration without running

trace info \
  --reference amplicon.fasta \
  --hdr-template hdr_template.fasta \
  --guide GCTGAAGCACTGCACGCCGT

This will print:

=== TRACE Analysis Configuration ===

Reference sequence: 500 bp
HDR template: 500 bp

Donor template analysis:
  - Left homology arm: positions 1-245 on reference (245 bp)
  - Right homology arm: positions 255-500 on reference (245 bp)
  - Donor edits detected at positions: 246, 247 on reference
    * Position 246: C → G (PAM-silencing mutation)
    * Position 247: C → T (chromophore Y66H mutation)

Guide analysis:
  - Guide sequence: GCTGAAGCACTGCACGCCGT
  - Guide targets: positions 248-267 on reference (- strand)
  - PAM: GGG at positions 245-247 on reference
  - Cleavage site: position 248 on reference

Multiple samples

Create a sample key TSV:

sample_id	r1_path	r2_path	condition
sample_1	/path/to/S1_R1.fastq.gz	/path/to/S1_R2.fastq.gz	treatment
sample_2	/path/to/S2_R1.fastq.gz	/path/to/S2_R2.fastq.gz	control

Then run:

trace run \
  --reference amplicon.fasta \
  --hdr-template hdr_template.fasta \
  --guide GCTGAAGCACTGCACGCCGT \
  --sample-key samples.tsv \
  --output results/ \
  --threads 16

Using Cas12a

trace run \
  --reference amplicon.fasta \
  --hdr-template hdr_template.fasta \
  --guide GCTGAAGCACTGCACGCCGTAA \
  --nuclease cas12a \
  --sample-key samples.tsv \
  --output results/

Edit Detection

TRACE automatically detects edits by aligning the HDR template to the reference:

  • Substitutions: Single nucleotide changes (e.g., C → G)
  • Insertions: Extra bases in the HDR template (up to 50+ bp)
  • Deletions: Missing bases in the HDR template

For large edits, TRACE automatically increases the k-mer size to ensure reliable classification. The k-mer size is always at least 10 bp larger than the largest edit.

Example output for a 20 bp insertion:

Edits detected (1 total):
  * Position 125: +ATCGATCGATCGATCGATCG (20 bp insertion)

Maximum edit size: 20 bp
Recommended k-mer size: 30 bp

Nuclease Support

Cas9 (SpCas9)

  • PAM: NGG (3' of protospacer)
  • Cleavage: 3 bp upstream of PAM (blunt ends)

Cas12a (LbCpf1)

  • PAM: TTTN (5' of protospacer)
  • Cleavage: 18-19 bp downstream on target strand, 23 bp on non-target
  • Creates 4-5 nt 5' overhang (staggered cut)

Output

The main output is a TSV file with per-sample editing outcomes:

Column Description
sample Sample ID
classifiable_reads Total classifiable reads
duplicate_rate PCR duplicate rate (Tn5)
Dedup_WT_% Wild-type % (deduplicated)
Dedup_HDR_% HDR % (deduplicated)
Dedup_NHEJ_% NHEJ % (deduplicated)
Dedup_LgDel_% Large deletion %
kmer_hdr_rate K-mer method HDR rate
crispresso_hdr_rate CRISPResso2 HDR rate

Dependencies

Python

  • click>=8.0
  • pysam>=0.20
  • pandas>=1.5
  • numpy>=1.20
  • pyyaml>=6.0
  • rapidfuzz>=3.0
  • tqdm>=4.60

External tools (via conda)

  • bwa>=0.7
  • bbmap>=39
  • minimap2>=2.24
  • samtools>=1.16
  • crispresso2 (optional, but enabled by default)

Author

Kevin R. Roy

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trace_crispr-0.2.0.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trace_crispr-0.2.0-py3-none-any.whl (54.1 kB view details)

Uploaded Python 3

File details

Details for the file trace_crispr-0.2.0.tar.gz.

File metadata

  • Download URL: trace_crispr-0.2.0.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for trace_crispr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a9c7b76f8c48960867c026e047f82ef69b6015b6dba0d17ff03a5e78d1efbac1
MD5 e9007d3f148d356d9c572b4ec0eabce1
BLAKE2b-256 e28c8c721f86d0e5aff6b2da3f468151cc7023a46d3819a7b807c1471a9c9b31

See more details on using hashes here.

File details

Details for the file trace_crispr-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: trace_crispr-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 54.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for trace_crispr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fcefbc8e888942293e11f6102a4795ef7f7b9ded13cd8e88d5632cef2b83e8b3
MD5 d5dfb924cd81d95b3e7ba27d7dc60dfa
BLAKE2b-256 4d87c23b761963567a9d6344d6a1e6805a40240b659bab1a18c01337b63f5153

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page