Skip to main content

Feature extraction tools for circulating tumor DNA from GRCh37 aligned BAM files

Project description

Krewlyzer: Comprehensive cfDNA Feature Extraction Toolkit

Krewlyzer logo

PyPI version GitHub Actions Docker

Krewlyzer is a high-performance toolkit for extracting biological features from cell-free DNA (cfDNA) sequencing data. Designed for cancer genomics, liquid biopsy research, and clinical bioinformatics.

Built with Python + Rust for maximum performance. The compute-intensive core uses PyO3 to deliver 5-50x speedups over pure Python.

[!TIP] Full Documentation: msk-access.github.io/krewlyzer


Why Krewlyzer?

Cancer cells leave molecular fingerprints in your blood. Krewlyzer finds them.

The Fragmentomics Advantage

Traditional Liquid Biopsy Fragmentomics with Krewlyzer
Look for specific mutations Analyze how DNA is cut
Need prior knowledge of tumor Works without knowing mutations
Miss ~50% of early cancers Detect more cancers, earlier

Key insight: Tumor DNA fragments are shorter (~145bp) than healthy DNA (~166bp). Krewlyzer quantifies this difference and extracts ML-ready features.

What You Get

Feature Clinical Use
Fragment size ratios Tumor burden estimation
Cutting patterns Tissue of origin identification
Nucleosome positioning Epigenetic profiling
Mutation-specific sizes MRD monitoring

New to cfDNA? Read What is Cell-Free DNA? for background.


Quick Install

# Docker (recommended - all data bundled)
docker pull ghcr.io/msk-access/krewlyzer:latest

# Clone + Install (development)
git clone https://github.com/msk-access/krewlyzer.git && cd krewlyzer
git lfs pull && pip install -e .

# pip + Data Clone (custom environments)
pip install krewlyzer
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data

[!NOTE] pip users: The KREWLYZER_DATA_DIR env var is required to locate bundled assets. See Installation Guide for details.

Quick Start

# Run all fragmentomics features
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/

# Generate unified JSON for ML pipelines
krewlyzer run-all -i sample.bam --reference hg19.fa --output results/ --generate-json

# Individual tools
krewlyzer extract -i sample.bam -r hg19.fa -o output/
krewlyzer fsc -i output/sample.bed.gz -o output/

# Panel data (MSK-ACCESS) with target regions
krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
    --target-regions panel_targets.bed \
    --pon-model msk-access.pon.parquet

Features

Command Description Output
extract Extract fragments from BAM .bed.gz
motif End motif & MDS scores .EndMotif.tsv, .MDS.tsv
fsc Fragment size coverage .FSC.tsv
fsr Fragment size ratios .FSR.tsv
fsd Size distribution by arm .FSD.tsv
wps Windowed protection score .WPS.parquet
ocf Orientation-aware fragmentation .OCF.tsv
region-entropy TFBS/ATAC size entropy .TFBS.tsv, .ATAC.tsv
uxm Fragment-level methylation .UXM.tsv
mfsd Mutant vs wild-type sizes .mFSD.tsv
build-pon Build Panel of Normals .pon.parquet
run-all All features in one pass All outputs
--generate-json Unified JSON for ML .features.json

Panel Mode (--target-regions)

For targeted sequencing panels (MSK-ACCESS):

krewlyzer run-all -i sample.bam -r hg19.fa -o results/ \
    --target-regions panel_targets.bed
  • GC model: Trained on off-target fragments (unbiased)
  • Outputs: Split into .tsv (off-target) and .ontarget.tsv
  • Auto-PON: Use -A xs2 to auto-load bundled PON for z-scores
  • ML negatives: Use -A xs2 --skip-pon to output raw features (no z-scores)

Documentation


Citation

If you use Krewlyzer, please cite:

  • DELFI (FSR): Cristiano S, et al. Nature 2019
  • WPS: Snyder MW, et al. Cell 2016
  • OCF: Sun K, et al. Genome Res 2019
  • UXM: Loyfer N, et al. Nature 2022

See Citation & Scientific Background for full references.


License

GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE.


Developed by Ronak Shah (@rhshah) at Memorial Sloan Kettering Cancer Center.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krewlyzer-0.5.0.tar.gz (241.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

krewlyzer-0.5.0-cp312-cp312-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

krewlyzer-0.5.0-cp311-cp311-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

krewlyzer-0.5.0-cp310-cp310-manylinux_2_28_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file krewlyzer-0.5.0.tar.gz.

File metadata

  • Download URL: krewlyzer-0.5.0.tar.gz
  • Upload date:
  • Size: 241.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for krewlyzer-0.5.0.tar.gz
Algorithm Hash digest
SHA256 cc6810b4c74f65cb2da9a17a07d667cdc503eb9dfcf66b40e9e13e4e6c5a61ed
MD5 4bd71a2dfe70ca37c5789acd802b90ac
BLAKE2b-256 51b9b7927346dc55950b64eb26253561a351e73e706463a7c9f558f28c2f14c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for krewlyzer-0.5.0.tar.gz:

Publisher: release.yml on msk-access/krewlyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file krewlyzer-0.5.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for krewlyzer-0.5.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7789c3defb2fd7653e5307de28d37c8b30a62cc70e3e14a487bb44ad653dec9d
MD5 4679fdd0c475f9ecf88ba0ff1c2450ee
BLAKE2b-256 25130e12722628829e56f9f0a51c76efcc4e2621c76c7e8b8c22b56b67df14e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for krewlyzer-0.5.0-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: release.yml on msk-access/krewlyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file krewlyzer-0.5.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for krewlyzer-0.5.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f428381eeaa0c0d4cba4aabecb675c437a5b8ff9bda5435c183f4a2b82c6c209
MD5 c8c563d1b6b8830abfcca89c6bc71079
BLAKE2b-256 27622a21589607f36f88d400d324f6ee7ac85bf9e9ad40858907282b5df9413d

See more details on using hashes here.

Provenance

The following attestation bundles were made for krewlyzer-0.5.0-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: release.yml on msk-access/krewlyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file krewlyzer-0.5.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for krewlyzer-0.5.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4f043e44a3275d287792439c013c9935553c3295c57c2007cfae2e0f3872e378
MD5 62ca52b2ea5b752f84f5864c3a72c72a
BLAKE2b-256 d3235f0bd1aaab186918e219ba3bb5a001740e9ac46344f06a7c0cdbe37030da

See more details on using hashes here.

Provenance

The following attestation bundles were made for krewlyzer-0.5.0-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: release.yml on msk-access/krewlyzer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page