Feature extraction tools for circulating tumor DNA from GRCh37 aligned BAM files
Project description
Krewlyzer: Comprehensive cfDNA Feature Extraction Toolkit
Krewlyzer is a high-performance toolkit for extracting biological features from cell-free DNA (cfDNA) sequencing data. Designed for cancer genomics, liquid biopsy research, and clinical bioinformatics.
Built with Python + Rust for maximum performance. The compute-intensive core uses PyO3 to deliver 5-50x speedups over pure Python.
[!TIP] Full Documentation: msk-access.github.io/krewlyzer
Why Krewlyzer?
Cancer cells leave molecular fingerprints in your blood. Krewlyzer finds them.
The Fragmentomics Advantage
| Traditional Liquid Biopsy | Fragmentomics with Krewlyzer |
|---|---|
| Look for specific mutations | Analyze how DNA is cut |
| Need prior knowledge of tumor | Works without knowing mutations |
| Miss ~50% of early cancers | Detect more cancers, earlier |
Key insight: Tumor DNA fragments are shorter (~145bp) than healthy DNA (~166bp). Krewlyzer quantifies this difference and extracts ML-ready features.
What You Get
| Feature | Clinical Use |
|---|---|
| Fragment size ratios | Tumor burden estimation |
| Cutting patterns | Tissue of origin identification |
| Nucleosome positioning | Epigenetic profiling |
| Mutation-specific sizes | MRD monitoring |
New to cfDNA? Read What is Cell-Free DNA? for background.
Quick Install
# Docker (recommended - all data bundled)
docker pull ghcr.io/msk-access/krewlyzer:latest
# Clone + Install (development)
git clone https://github.com/msk-access/krewlyzer.git && cd krewlyzer
git lfs pull && pip install -e .
# pip + Data Clone (custom environments)
pip install krewlyzer
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data
[!NOTE] pip users: The
KREWLYZER_DATA_DIRenv var is required to locate bundled assets. See Installation Guide for details.
Quick Start
# Run all fragmentomics features
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/
# Generate unified JSON for ML pipelines
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/ --generate-json
# Individual tools
krewlyzer extract -i sample.bam -r hg19.fa -o output/
krewlyzer fsc -i output/sample.bed.gz -o output/
# Panel data (MSK-ACCESS) with target regions
krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
--target-regions panel_targets.bed \
--pon-model msk-access.pon.parquet
Features
| Command | Description | Output |
|---|---|---|
extract |
Extract fragments from BAM | .bed.gz |
motif |
End motif & MDS scores | .EndMotif.tsv, .MDS.tsv |
fsc |
Fragment size coverage | .FSC.tsv |
fsr |
Fragment size ratios | .FSR.tsv |
fsd |
Size distribution by arm | .FSD.tsv |
wps |
Windowed protection score | .WPS.parquet |
ocf |
Orientation-aware fragmentation | .OCF.tsv |
region-entropy |
TFBS/ATAC size entropy | .TFBS.tsv, .ATAC.tsv |
uxm |
Fragment-level methylation | .UXM.tsv |
mfsd |
Mutant vs wild-type sizes | .mFSD.tsv |
build-pon |
Build Panel of Normals | .pon.parquet |
run-all |
All features in one pass | All outputs |
--generate-json |
Unified JSON for ML | .features.json |
Panel Mode (--target-regions)
For targeted sequencing panels (MSK-ACCESS):
krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
--target-regions panel_targets.bed
- GC model: Trained on off-target fragments (unbiased)
- Outputs: Split into
.tsv(off-target) and.ontarget.tsv - Auto-PON: Use
-A xs2to auto-load bundled PON for z-scores - ML negatives: Use
-A xs2 --skip-ponto output raw features (no z-scores)
Documentation
- Getting Started - 5-minute quickstart
- Installation - Docker, pip, development
- Usage Guide - CLI reference
- Feature Details - Per-feature documentation
- Nextflow Pipeline - Batch processing
Citation
If you use Krewlyzer, please cite:
- DELFI (FSR): Cristiano S, et al. Nature 2019
- WPS: Snyder MW, et al. Cell 2016
- OCF: Sun K, et al. Genome Res 2019
- UXM: Loyfer N, et al. Nature 2022
See Citation & Scientific Background for full references.
License
GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE.
Developed by Ronak Shah (@rhshah) at Memorial Sloan Kettering Cancer Center.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file krewlyzer-0.7.0.tar.gz.
File metadata
- Download URL: krewlyzer-0.7.0.tar.gz
- Upload date:
- Size: 273.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fef369c2d492d6129806fa15b002a9897b72205fc9a14fff3ee4522c3e78693
|
|
| MD5 |
0bdef1d8ba72fadd2b46d75743ac9779
|
|
| BLAKE2b-256 |
ab7138f3adac5b3efda2ae13ef8465a1a59c0d2069d71778af7f6879d31c7975
|
Provenance
The following attestation bundles were made for krewlyzer-0.7.0.tar.gz:
Publisher:
release.yml on msk-access/krewlyzer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
krewlyzer-0.7.0.tar.gz -
Subject digest:
1fef369c2d492d6129806fa15b002a9897b72205fc9a14fff3ee4522c3e78693 - Sigstore transparency entry: 1029539004
- Sigstore integration time:
-
Permalink:
msk-access/krewlyzer@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Branch / Tag:
refs/tags/0.7.0 - Owner: https://github.com/msk-access
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Trigger Event:
push
-
Statement type:
File details
Details for the file krewlyzer-0.7.0-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: krewlyzer-0.7.0-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02e3e2637ad35fbcdf2e9c9be9fa6b525977572cfb3fac318b5950083b8723a3
|
|
| MD5 |
5d8ea925e11c509c9cece9c260d09750
|
|
| BLAKE2b-256 |
5220168a1e756b57e3925d823f36703b28f2b6aa8cac0b08dd33753fb8eb3251
|
Provenance
The following attestation bundles were made for krewlyzer-0.7.0-cp312-cp312-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on msk-access/krewlyzer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
krewlyzer-0.7.0-cp312-cp312-manylinux_2_28_x86_64.whl -
Subject digest:
02e3e2637ad35fbcdf2e9c9be9fa6b525977572cfb3fac318b5950083b8723a3 - Sigstore transparency entry: 1029539136
- Sigstore integration time:
-
Permalink:
msk-access/krewlyzer@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Branch / Tag:
refs/tags/0.7.0 - Owner: https://github.com/msk-access
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Trigger Event:
push
-
Statement type:
File details
Details for the file krewlyzer-0.7.0-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: krewlyzer-0.7.0-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ceb86562206af7db56def9900e4f2da2333361f154a80c7792cdb5019162196
|
|
| MD5 |
17a7c5ce70102bd48c4f8a0b6bc9596c
|
|
| BLAKE2b-256 |
01c33cad67302f36affe9664a676413bf243d725f7976c2509667a15f5ba171d
|
Provenance
The following attestation bundles were made for krewlyzer-0.7.0-cp311-cp311-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on msk-access/krewlyzer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
krewlyzer-0.7.0-cp311-cp311-manylinux_2_28_x86_64.whl -
Subject digest:
8ceb86562206af7db56def9900e4f2da2333361f154a80c7792cdb5019162196 - Sigstore transparency entry: 1029539067
- Sigstore integration time:
-
Permalink:
msk-access/krewlyzer@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Branch / Tag:
refs/tags/0.7.0 - Owner: https://github.com/msk-access
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Trigger Event:
push
-
Statement type:
File details
Details for the file krewlyzer-0.7.0-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: krewlyzer-0.7.0-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bccc4d81294f8eac1b756bdd70fde8d3bb119f7d96ffba0fc51977337b2beda
|
|
| MD5 |
90eeaf35b7f6ae1c78fc1b33e0ccb2d5
|
|
| BLAKE2b-256 |
c7a5d66b7d91aecac05d80c2e2b7eff7ac87510fc3fa90f7c9e71dff6f0c1309
|
Provenance
The following attestation bundles were made for krewlyzer-0.7.0-cp310-cp310-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on msk-access/krewlyzer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
krewlyzer-0.7.0-cp310-cp310-manylinux_2_28_x86_64.whl -
Subject digest:
6bccc4d81294f8eac1b756bdd70fde8d3bb119f7d96ffba0fc51977337b2beda - Sigstore transparency entry: 1029539197
- Sigstore integration time:
-
Permalink:
msk-access/krewlyzer@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Branch / Tag:
refs/tags/0.7.0 - Owner: https://github.com/msk-access
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@36305c97c42218fabd83c14cecb41e6897dd8c65 -
Trigger Event:
push
-
Statement type: