TRACE: Triple-aligner Read Analysis for CRISPR Editing

These details have not been verified by PyPI

Project links

Project description

TRACE

Triple-aligner Read Analysis for CRISPR Editing

Robust quantification of CRISPR editing outcomes from amplicon sequencing data using consensus alignment across BWA-MEM, BBMap, and minimap2.

Quick Start

# Install
pip install trace-crispr

# Run on a single sample
trace run \
  --reference amplicon.fasta \
  --hdr-template donor.fasta \
  --guide GCTGAAGCACTGCACGCCGT \
  --r1 sample_R1.fastq.gz \
  --r2 sample_R2.fastq.gz \
  --output results/

# Run on multiple samples
trace run \
  --reference amplicon.fasta \
  --hdr-template donor.fasta \
  --guide GCTGAAGCACTGCACGCCGT \
  --sample-key samples.tsv \
  --output results/ \
  --threads 16

Output

Editing Summary

TRACE produces a comprehensive per-sample summary table with editing outcomes:

sample     total_reads  HDR_COMPLETE_%  HDR_PARTIAL_%  NHEJ_INDEL_%  MMEJ_INDEL_%  WT_%
──────────────────────────────────────────────────────────────────────────────────────────
sample_01  125,432      42.1            3.1            8.2           4.1           38.1
sample_02  98,765       48.7            3.4            5.6           3.1           35.8
sample_03  112,890      45.2            3.7            9.8           5.4           31.2

Classification Categories

TRACE classifies reads into 12 comprehensive categories:

HDR Categories (successful donor integration):

Outcome	Description
`HDR_COMPLETE`	All donor-encoded edits present, no other modifications
`HDR_PARTIAL`	Subset of donor SNVs integrated, no other modifications
`HDR_PLUS_NHEJ_INDEL`	Donor edits + classical NHEJ indel at cut site (0-2bp microhomology)
`HDR_PLUS_MMEJ_INDEL`	Donor edits + MMEJ indel at cut site (>2bp microhomology)
`HDR_PLUS_OTHER`	Donor edits + modifications NOT at cut site

Non-HDR Repair Outcomes:

Outcome	Description
`DONOR_CAPTURE`	Donor edits present + extra donor sequence duplicated at site (over-integration)
`NHEJ_INDEL`	Classical NHEJ indel at cut site (0-2bp microhomology at deletion boundaries)
`MMEJ_INDEL`	Microhomology-mediated indel (>2bp microhomology at deletion boundaries)

Other Outcomes:

Outcome	Description
`WT`	Wild-type / unedited
`NON_DONOR_SNV`	SNVs not matching donor template, no indels
`UNCLASSIFIED`	Does not fit other categories
`UNMAPPED`	Read did not align to reference

Note: LARGE_DELETION (≥50bp) is a flag applied to reads within the above categories, not a separate category.

NHEJ vs MMEJ Classification

TRACE distinguishes between two types of error-prone repair based on deletion microhomology:

NHEJ (Non-Homologous End Joining): Deletions with 0-2bp of microhomology at the deletion boundaries. This is classical error-prone repair.
MMEJ (Microhomology-Mediated End Joining): Deletions with >2bp of microhomology. This alternative repair pathway uses short homologous sequences to rejoin the break.

This distinction is biologically significant as MMEJ and NHEJ involve different repair proteins and have different mutational signatures.

Output Columns

The main output file (per_sample_editing_outcomes_all_methods.tsv) contains:

Column	Description
`sample`	Sample identifier
`total_reads`	Total reads processed
`aligned_reads`	Reads that aligned to reference
`classifiable_reads`	Reads that could be classified
`duplicate_rate`	Fraction of reads removed as duplicates
`HDR_COMPLETE_%`	Percentage with complete HDR
`HDR_PARTIAL_%`	Percentage with partial HDR
`HDR_PLUS_NHEJ_%`	HDR + classical NHEJ indel
`HDR_PLUS_MMEJ_%`	HDR + microhomology-mediated indel
`HDR_PLUS_OTHER_%`	HDR + other modifications
`HDR_total_%`	Sum of all HDR categories
`DONOR_CAPTURE_%`	Donor over-integration
`NHEJ_INDEL_%`	Classical NHEJ only
`MMEJ_INDEL_%`	MMEJ only
`NHEJ_MMEJ_total_%`	Combined NHEJ + MMEJ rate
`WT_%`	Wild-type percentage
`NON_DONOR_SNV_%`	Non-donor SNVs
`UNCLASSIFIED_%`	Unclassified reads
`Edited_total_%`	All edited reads (excludes WT)

Pooled Summary (Technical Replicates)

For experiments with technical replicates, use trace aggregate to generate pooled statistics:

bio_sample  quality_flag  n_reps  hdr_total_pct_mean  hdr_total_pct_sem  nhej_mmej_total_pct_mean
────────────────────────────────────────────────────────────────────────────────────────────────────
gene_A      good          3       45.2                1.3                12.1
gene_B      good          3       52.8                0.9                8.4
gene_C      good          3       38.1                2.1                18.7

The quality_flag column indicates:

good: All replicates passed QC
one_low_read_removed: One replicate had low reads and was excluded
all_low_reads: All replicates had low read counts (use with caution)

How It Works

Triple-Aligner Consensus

TRACE uses three independent aligners to maximize accuracy:

                    ┌─────────────┐
     Raw Reads ────►│   BWA-MEM   │────┐
                    └─────────────┘    │
                    ┌─────────────┐    │     ┌───────────────┐     ┌────────────┐
     Raw Reads ────►│   BBMap     │────┼────►│   Consensus   │────►│  Classify  │
                    └─────────────┘    │     └───────────────┘     └────────────┘
                    ┌─────────────┐    │
     Raw Reads ────►│  minimap2   │────┘
                    └─────────────┘

Each aligner has different strengths—BWA-MEM for accuracy, BBMap for gapped alignments, minimap2 for speed. TRACE takes the consensus to reduce aligner-specific artifacts.

Edit Distance Classification

TRACE classifies reads by measuring edit distance to both reference and donor sequences:

Align read to reference amplicon
Extract mismatches in core edit region (±30bp from cut site)
Check if each mismatch brings sequence closer to donor template
Classify based on SNV pattern and presence of indels

This handles complex cases where aligners represent clustered SNVs as indels.

Automatic Detection

TRACE auto-detects library characteristics:

Detection	Method
Library type	TruSeq (fixed primers) vs Tn5 (tagmented) based on read start clustering
UMI presence	Sequence diversity analysis at read starts
Read overlap	Determines if R1/R2 can be merged
PCR duplicates	UMI-based (TruSeq) or position-based (Tn5) deduplication

Installation

pip (Python package only)

pip install trace-crispr

conda (includes aligners)

conda install -c bioconda -c conda-forge trace-crispr

This installs BWA, BBMap, minimap2, and samtools automatically.

Development

git clone https://github.com/k-roy/TRACE.git
cd TRACE
pip install -e ".[dev]"

Usage

Check Locus Configuration

Before running, verify TRACE correctly detects your editing design:

trace info \
  --reference amplicon.fasta \
  --hdr-template donor.fasta \
  --guide GCTGAAGCACTGCACGCCGT

Output:

============================================================
TRACE Analysis Configuration
============================================================

Reference sequence: 231 bp
HDR template: 127 bp
  - Template aligns at position 53 in reference

Donor template analysis:
  - Left homology arm: positions 53-124 (72 bp)
  - Right homology arm: positions 128-179 (52 bp)

  Edits detected (2 total):
    * Position 125: T -> A (substitution)
    * Position 127: G -> A (substitution)

Guide analysis:
  - Guide: GCTGAAGCACTGCACGCCGT
  - Target: positions 105-124 (+ strand)
  - PAM: TGG at positions 125-127
  - Cleavage site: position 122

Sample Key Format

Create a TSV file with sample information:

sample_id    r1_path                  r2_path                  condition
sample_01    /path/to/S1_R1.fastq.gz  /path/to/S1_R2.fastq.gz  treatment
sample_02    /path/to/S2_R1.fastq.gz  /path/to/S2_R2.fastq.gz  treatment
sample_03    /path/to/S3_R1.fastq.gz  /path/to/S3_R2.fastq.gz  control

Per-Sample Sequences

For experiments with different guides/donors per sample:

sample_id  r1_path      r2_path      reference      guide                   hdr_template
sample_01  S1_R1.fq.gz  S1_R2.fq.gz  locus1.fasta   AGAGAAACACACTGTACTCCGT  donor1.fasta
sample_02  S2_R1.fq.gz  S2_R2.fq.gz  locus2.fasta   TTGGTTACAACTCTGACCCA    donor2.fasta

trace run --sample-key manifest.tsv --output results/

Multi-Template Analysis (Barcode Screening)

For experiments with multiple possible HDR templates:

trace multi-template \
  --reference amplicon.fasta \
  --hdr-templates all_barcodes.fasta \
  --guide GAGTCCGAGCAGAAGAAGAA \
  --sample-key samples.tsv \
  --output results/

Cas12a Support

trace run \
  --reference amplicon.fasta \
  --hdr-template donor.fasta \
  --guide GCTGAAGCACTGCACGCCGTAA \
  --nuclease cas12a \
  --sample-key samples.tsv \
  --output results/

Nuclease	PAM	Cleavage
Cas9	NGG (3' of guide)	Blunt, 3bp from PAM
Cas12a	TTTN (5' of guide)	Staggered, 18-23bp from PAM

Snakemake Workflow

For large-scale processing:

snakemake --configfile config.yaml --cores 24

See workflow/README.md for details.

CLI Commands Reference

TRACE provides the following commands:

Command	Description
`trace run`	Main analysis pipeline for single or multiple samples
`trace info`	Display locus configuration (reference, donor, guide alignment)
`trace classify`	Re-classify reads from an existing BAM file
`trace classify-batch`	Batch re-classification on multiple BAM files
`trace multi-template`	Analyze samples with multiple possible HDR templates
`trace aggregate`	Aggregate results across technical replicates
`trace extract-kmers`	Extract k-mers for contamination filtering
`trace generate-manifest`	Generate a sample manifest from a directory of FASTQs
`trace generate-templates`	Generate multi-template reference from barcode list
`trace init`	Create a template configuration file

Use trace <command> --help for detailed options.

Example: Aggregating Replicates

# After running trace on all samples
trace aggregate \
  --input results/per_sample_editing_outcomes_all_methods.tsv \
  --sample-key samples.tsv \
  --group-by condition \
  --output results/aggregated_by_condition.tsv

Analysis Module

Compare editing outcomes across conditions with statistical testing:

from trace_crispr.analysis import (
    compare_metric_by_condition,
    results_to_dataframe,
    plot_condition_comparison,
)

# Compare HDR rates
comparisons = compare_metric_by_condition(
    results, samples,
    condition_col='treatment',
    metric='dedup_hdr_pct',
    base_condition='control'
)

# View statistics
print(comparisons.to_dataframe())

     condition        metric  mean   std   p_value  significance
0  treatment_A  dedup_hdr_pct  25.4  2.63   0.0003          ***
1  treatment_B  dedup_hdr_pct  12.1  1.89   0.4521           ns

Visualization

fig = plot_condition_comparison(
    stats, comparisons,
    base_condition='control',
    title='HDR Rate by Treatment',
    ylabel='HDR Rate (%)'
)
fig.savefig('hdr_comparison.png', dpi=150)

Install visualization dependencies:

pip install trace-crispr[visualization]

Dependencies

Python (core): click, pysam, pandas, numpy, pyyaml, rapidfuzz, tqdm

Python (visualization): matplotlib, seaborn, scipy

External (via conda): bwa, bbmap, minimap2, samtools

Citation

If you use TRACE in your research, please cite:

Roy, K.R. et al. (2026). TRACE: Triple-aligner Read Analysis for CRISPR Editing. In preparation.

Author

Kevin R. Roy (kevinrjroy@gmail.com)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.3

Mar 21, 2026

0.6.2

Mar 20, 2026

This version

0.6.1

Mar 19, 2026

0.6.0

Mar 18, 2026

0.4.0

Feb 21, 2026

0.3.1

Feb 17, 2026

0.3.0

Feb 17, 2026

0.2.1

Feb 12, 2026

0.2.0

Feb 12, 2026

0.1.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trace_crispr-0.6.1.tar.gz (208.7 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trace_crispr-0.6.1-py3-none-any.whl (240.1 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file trace_crispr-0.6.1.tar.gz.

File metadata

Download URL: trace_crispr-0.6.1.tar.gz
Upload date: Mar 19, 2026
Size: 208.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for trace_crispr-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`93331f16491be99dbf2d9241f49e36f28855bfb55671272f2a98b6f7a2b06efb`
MD5	`7e9fb02f85f5d52b772c5b8184e09b7e`
BLAKE2b-256	`01a02692038d9ac25d801e745617a86ab02bee839fe5704b1b84b1b7b982bfad`

See more details on using hashes here.

File details

Details for the file trace_crispr-0.6.1-py3-none-any.whl.

File metadata

Download URL: trace_crispr-0.6.1-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 240.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for trace_crispr-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f62c320b4b945a3a6898c0f4588f1a1790c59fce4ec78d2a445b4eb74e4c6df9`
MD5	`3a47dd06eee05871aa398694ffdc4e4f`
BLAKE2b-256	`5fe674d800af6840b2e594ee4ca271fb0e912d60bae12f51b61ccebd0d619c5d`

See more details on using hashes here.

trace-crispr 0.6.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

TRACE

Quick Start

Output

Editing Summary

Classification Categories

NHEJ vs MMEJ Classification

Output Columns

Pooled Summary (Technical Replicates)

How It Works

Triple-Aligner Consensus

Edit Distance Classification

Automatic Detection

Installation

pip (Python package only)

conda (includes aligners)

Development

Usage

Check Locus Configuration

Sample Key Format

Per-Sample Sequences

Multi-Template Analysis (Barcode Screening)

Cas12a Support

Snakemake Workflow

CLI Commands Reference

Example: Aggregating Replicates

Analysis Module

Visualization

Dependencies

Citation

Author

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes