Evaluate cfDNA fragmentomics features for ctDNA detection

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

msk-access

These details have not been verified by PyPI

Project description

kreview

Advanced cfDNA Fragmentomics Core Evaluation Engine

🧬 Overview

kreview is a production-grade, notebook-first (nbdev) evaluation engine designed for high-throughput cancer liquid biopsy fragmentomics feature analysis. Developed at Memorial Sloan Kettering (MSKCC), it processes cohorts containing tens of thousands of samples using an embedded DuckDB query engine with chunked I/O and automatic retry logic.

📖 Full Documentation

🚀 Features

5-Tier ctDNA Taxonomy: MSK-IMPACT paired-inference to label True ctDNA+, Possible ctDNA+, Possible ctDNA−, Healthy Normal, and Insufficient Data. Optional CH hotspot demotion via --ch-hotspot-maf.
DuckDB Dynamic Data Lake: In-memory read_parquet bindings with chunked I/O and exponential backoff retry. Builds a merged SQL-queryable kreview_lake.duckdb on demand.
Multi-Model Evaluation: Logistic Regression, Random Forest, and XGBoost (CPU) plus TabPFN and TabICL (GPU) with Stratified K-Fold CV, SHAP explainability, and subgroup analysis.
Feature Selection: mRMR (Minimum Redundancy Maximum Relevance) as default strategy — iteratively selects features maximizing target relevance while minimizing inter-feature redundancy. Legacy hybrid_union (AUC ∪ MI) also available.
Multimodal Stacking: Cross-evaluator fusion via super-matrix with Mutual Information or Boruta-SHAP selection, followed by stacking ensemble + ablation analysis.
Interactive Dashboards: Plotly-native HTML reports with ROC curves, violin plots, SHAP beeswarm/waterfall, mRMR scatter plots, per-cancer-type sensitivity tables, and Decision Curve Analysis.
Nextflow HPC Integration: Decomposed multistage DAG for SLURM-based HPC execution with per-evaluator parallelism, GPU scheduling, and automatic retry logic.
26 Built-In Evaluators: Modular extractors covering fragment sizes (FSC, FSD, FSR), nucleosome protection (WPS, TFBS), cleavage motifs (EndMotif, BreakPointMotif), chromatin accessibility (ATAC), motif divergence (MDS), and orientation (OCF).

🏗️ Pipeline Architecture

graph LR
    A[Label] --> B["Extract ×N"]
    B --> C[Select]
    C --> D["Eval CPU"]
    C --> E["Eval GPU"]
    C --> F[Fuse]
    D --> G[Scoreboard]
    E --> G
    D --> I["Eval Multimodal"]
    E --> I
    F --> I
    G --> H[Report]
    I --> J["Report Multimodal"]

The pipeline supports two modes:

Mode	Command	Use Case
Monolithic	`kreview run`	Single-machine, sequential execution
Multistage	`nextflow run ... -profile iris`	HPC parallelism, per-evaluator scatter

⚙️ Quick Start

Installation

[!IMPORTANT] Quarto is strictly required for programmatic dashboard generation. Because quarto-cli wrapper packages are unreliable across Python environments, kreview assumes the Quarto executable is installed dynamically on your OS or container.

Option 1: Docker (Recommended "Batteries-Included" Method)

The easiest way to run kreview without managing external dependencies is to use our pre-built Docker containers (hosted on GHCR). They ship with Python 3.12, all ML libraries, and quarto:

# CPU image (~1.5 GB) — for all standard pipeline processes
docker pull ghcr.io/msk-access/kreview:latest

# GPU image (~8-10 GB) — adds PyTorch, TabPFN, TabICL (requires NVIDIA drivers)
docker pull ghcr.io/msk-access/kreview:latest-gpu

# Run
docker run -v /your/data:/data ghcr.io/msk-access/kreview:latest \
  kreview run --cancer-samplesheet /data/cancer.csv ...

Option 2: Local Install (Pip)

If you install via pip, you must separately install Quarto via your OS manager:

Install Quarto: Follow the official Quarto Installation Guide (e.g. brew install quarto on macOS).
Install kreview:

git clone https://github.com/msk-access/kreview.git
cd kreview
pip install -e .          # CPU models only
pip install -e ".[gpu]"   # + TabPFN, TabICL (requires CUDA)

Running the Pipeline

Local (Single Machine)

kreview run \
  --cancer-samplesheet "/path/to/cancer/samplesheet.csv" \
  --healthy-xs1-samplesheet "/path/to/healthy/xs1/samplesheet.csv" \
  --healthy-xs2-samplesheet "/path/to/healthy/xs2/samplesheet.csv" \
  --cbioportal-dir "/path/to/cBioPortal_MAF_CNA_SV/" \
  --krewlyzer-dir "/path/to/unified_krewlyzer_results" \
  --output output/ \
  --strategy mrmr \
  --top-percentile 10 \
  --compute-univariate-auc \
  --ch-hotspot-maf "/path/to/ch_hotspots.maf" \
  --export-duckdb

HPC (Nextflow + SLURM)

nextflow run /path/to/kreview/nextflow/main.nf \
  --cancer_samplesheet /path/to/cancer.csv \
  --healthy_xs1_samplesheet /path/to/healthy_xs1.csv \
  --healthy_xs2_samplesheet /path/to/healthy_xs2.csv \
  --cbioportal_dir /path/to/cbioportal/ \
  --krewlyzer_dir /path/to/manifest.txt \
  --outdir /path/to/output/ \
  --pipeline_mode multistage \
  --run_gpu_eval true \
  --gpu_models "tabpfn,tabicl" \
  --run_multimodal_eval true \
  -profile iris

Dashboard Access

Once finished, open the generated HTML reports:

open output/reports/ATAC_dashboard.html

🧪 Feature Selection

Strategy	Scope	Method	Default
`mrmr`	Single-evaluator	F-statistic relevance + Pearson redundancy penalty	✅
`hybrid_union`	Single-evaluator	Top-X% AUC ∪ Top-X% MI	Legacy
`mi`	Multimodal	Mutual Information top-K ranking	✅
`boruta_shap`	Multimodal	SHAP importance vs shadow variables (50 trials)	Optional

See Statistical Evaluation for full documentation.

📓 nbdev Architecture

This project operates as an nbdev repo. Do not edit .py scripts manually in kreview/. Build natively inside Jupyter notebooks within nbs/ and trigger:

nbdev_export

📚 Resources

Documentation — Full user and developer guide
Contributing — How to contribute
Changelog — Version history

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

msk-access

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.27

Jun 26, 2026

0.0.26

Jun 24, 2026

0.0.25

Jun 22, 2026

0.0.24

Jun 18, 2026

0.0.23

Jun 16, 2026

0.0.22

Jun 12, 2026

0.0.21

Jun 10, 2026

0.0.20

Jun 10, 2026

0.0.19

Jun 8, 2026

0.0.18

Jun 5, 2026

0.0.17

Jun 5, 2026

This version

0.0.16

Jun 3, 2026

0.0.15

Jun 1, 2026

0.0.14

May 25, 2026

0.0.13

May 25, 2026

0.0.12

May 24, 2026

0.0.11

May 24, 2026

0.0.10

May 22, 2026

0.0.9

May 7, 2026

0.0.8

May 6, 2026

0.0.7

Apr 13, 2026

0.0.6

Apr 10, 2026

0.0.5

Apr 10, 2026

0.0.3

Apr 10, 2026

0.0.1

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kreview-0.0.16.tar.gz (172.2 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kreview-0.0.16-py3-none-any.whl (168.1 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file kreview-0.0.16.tar.gz.

File metadata

Download URL: kreview-0.0.16.tar.gz
Upload date: Jun 3, 2026
Size: 172.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kreview-0.0.16.tar.gz
Algorithm	Hash digest
SHA256	`64739f1134775fb4b6648f9ab1641f76418d0b405081056a3b80cfdd0238e92a`
MD5	`b90b5757061db8298b5683fe69f18ca5`
BLAKE2b-256	`e2931084028bd569e97f7640ccf6bf0935964ec811c6a8840f9185e220e11d85`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kreview-0.0.16.tar.gz:

Publisher: release.yml on msk-access/kreview

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kreview-0.0.16.tar.gz
- Subject digest: 64739f1134775fb4b6648f9ab1641f76418d0b405081056a3b80cfdd0238e92a
- Sigstore transparency entry: 1711835501
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: msk-access/kreview@c29f82b4ae2da7ee13954e292cd30a369f63e1d3
- Branch / Tag: refs/tags/v0.0.16
- Owner: https://github.com/msk-access
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c29f82b4ae2da7ee13954e292cd30a369f63e1d3
- Trigger Event: push

File details

Details for the file kreview-0.0.16-py3-none-any.whl.

File metadata

Download URL: kreview-0.0.16-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 168.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kreview-0.0.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`327f22fd5298975b9ce28acc73e60a98bbbd324c5730bda4c37e288215112c60`
MD5	`11a4b8cc03c9e9504144bfafa8c98d26`
BLAKE2b-256	`98ebb17ecff5b75c1f5216fd8cca8570baac4e882cc0d88e7223526eb736eca8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kreview-0.0.16-py3-none-any.whl:

Publisher: release.yml on msk-access/kreview

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kreview-0.0.16-py3-none-any.whl
- Subject digest: 327f22fd5298975b9ce28acc73e60a98bbbd324c5730bda4c37e288215112c60
- Sigstore transparency entry: 1711835527
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: msk-access/kreview@c29f82b4ae2da7ee13954e292cd30a369f63e1d3
- Branch / Tag: refs/tags/v0.0.16
- Owner: https://github.com/msk-access
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c29f82b4ae2da7ee13954e292cd30a369f63e1d3
- Trigger Event: push

kreview 0.0.16

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

kreview

🧬 Overview

🚀 Features

🏗️ Pipeline Architecture

⚙️ Quick Start

Installation

Option 1: Docker (Recommended "Batteries-Included" Method)

Option 2: Local Install (Pip)

Running the Pipeline

Local (Single Machine)

HPC (Nextflow + SLURM)

Dashboard Access

🧪 Feature Selection

📓 nbdev Architecture

📚 Resources

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance