Skip to main content

SniffCell annotates structural variants using long-read methylation evidence and ctDMR signals.

Project description

SniffCell

PyPI version Python License: MIT Docs Issues

SniffCell is a Python toolkit for annotating somatic structural variants (SVs) with cell-type origin using long-read DNA methylation. It integrates cell-type-specific differentially methylated regions (ctDMRs) derived from a reference methylation atlas with per-read methylation measurements from nanopore or PacBio long-read BAMs to assign each SV — or every read in a sample — to a cell population.


Why SniffCell?

Somatic SVs identified from bulk long-read sequencing are a mixture of events from different cell types. Without knowing the cell of origin, it is difficult to interpret their functional significance or estimate their true variant allele fraction within a specific compartment. SniffCell solves this by reading the epigenetic "fingerprint" imprinted on each DNA molecule and matching it against a reference atlas of cell-type-specific methylation patterns.

Core capabilities:

  • ctDMR discovery — Mine a reference methylation atlas to find genomic regions with distinct methylation in each cell population
  • Read-level deconvolution — Assign every read in a BAM to a cell type using ctDMR methylation signals, with no single-cell data required
  • SV annotation — Link cell-type identity to SV-supporting reads and produce a per-SV cell-of-origin call
  • Discovery pipeline — Run a full multi-stage SV / tandem-repeat / SNV calling workflow on cell-type-split BAMs produced by deconvolution
  • Interactive reporting — Filter high-confidence SVs and generate an HTML review report with clickable per-SV figures and IGV screenshots

Overview

SniffCell workflow

The typical workflow has three main stages:

Atlas (NPY + index + metadata)
        │
        ▼
  sniffcell find         ← Call cell-type-specific DMRs (ctDMRs)
        │
        ▼
  sniffcell anno         ← Extract methylation from BAM, classify reads, assign SVs
        │
        ▼
  sniffcell report       ← Filter high-confidence calls, build HTML review report
        │
        ├── sniffcell viz        ← Per-SV methylation figure (PNG / PDF)
        ├── sniffcell igvviz     ← IGV batch screenshots
        └── sniffcell dmsv       ← Differential methylation test near each SV

For multi-group analyses (e.g., comparing SVs enriched in one cell compartment vs. another):

  sniffcell deconv       ← Deconvolve all reads; split BAM by cell type
        │
        ▼
  sniffcell discover     ← Call SVs / TRs / SNVs independently per group
        │
        ▼
  sniffcell anno         ← Annotate harmonized variants

Quick Start

1. Install

pip install sniffcell

For the full environment including bioinformatics tools (Sniffles, bcftools, samtools, Truvari …):

micromamba env create -f environment.yml
micromamba activate sniffcell
pip install sniffcell

See Installation in the wiki for Docker instructions, optional extras, and manual tool setup.

2. Call ctDMRs from the reference atlas

sniffcell find \
  -n atlas/all_celltypes_blocks.npy \
  -i atlas/all_celltypes_blocks.index.gz \
  -cf atlas/index_to_major_celltypes.json \
  -m atlas/all_celltypes.txt \
  -ck pbmc \
  -o pbmc_ctdmr.tsv

3. Annotate SVs with cell-type evidence

sniffcell anno \
  -i sample.bam \
  -v sample.vcf.gz \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -o anno_out \
  -t 8

4. Build the review report

sniffcell report --anno_output anno_out

Open anno_out/report/index.html in a browser to review filtered high-confidence SVs with per-SV methylation evidence.


Commands at a Glance

Command What it does
sniffcell find Mine a reference atlas to call cell-type-specific DMRs (ctDMRs)
sniffcell anno Extract read-level methylation from a BAM and assign each SV a cell-type code
sniffcell svanno Re-score SV assignments from a saved read table without re-processing the BAM
sniffcell deconv Assign every read in a BAM to a cell type; optionally split into per-group BAMs
sniffcell discover Multi-stage SV / tandem-repeat / SNV pipeline on cell-type-split BAMs
sniffcell viz Render a per-SV methylation figure (PNG or PDF)
sniffcell igvviz Produce IGV batch-mode screenshots for a single SV
sniffcell report Filter high-confidence SVs and build an interactive HTML review report
sniffcell dmsv Test for differential methylation between SV-supporting and non-supporting reads

Input Requirements

Input Format Used by
Long-read alignment BAM with MM/ML base-modification tags anno, deconv, dmsv, viz
Structural variants VCF / VCF.GZ or harmonized TSV from discover anno, dmsv, viz, report
Reference genome FASTA + index anno, deconv, dmsv, viz
ctDMR table TSV from sniffcell find anno, deconv, viz
Methylation atlas NumPy matrix + CpG index + metadata find

Key Outputs

After a complete find → anno → report run, the outputs include:

pbmc_ctdmr.tsv                      ← Cell-type-specific DMRs (input to anno)
anno_out/
  reads_classification.tsv          ← Per-read × ctDMR methylation and cell-type assignment
  sv_assignment.tsv                 ← Per-SV cell-type code and quality metrics
  sv_assignment_readable.tsv        ← Human-readable version with expanded cell-type labels
  anno_run_manifest.json            ← Full run manifest (paths, parameters, versions)
  report/
    index.html                      ← Interactive HTML review report
    high_confidence_sv.tsv          ← Filtered high-confidence SVs
    figures/                        ← Per-SV methylation panels (when --with_figures)

Deconvolution and Discovery

For samples where you want to compare SVs across cell populations:

# Optional: plan regional inputs before running targeted deconvolution/discovery
sniffcell regions \
  -b pbmc_ctdmr.tsv \
  --regions loci_of_interest.bed \
  -o regional_plan \
  --regions-ctdmrs 10

This writes regional_plan/subset_regions.bed for downstream subsetting plus ctdmr_subset.tsv, ctdmr_region_summary.tsv, and ctdmr_selected_summary.tsv describing the ctDMRs selected from the sniffcell find output.

# Step 1: Deconvolve reads and split into cell-type-specific BAMs
sniffcell deconv \
  -i sample.bam \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -o deconv_out \
  --split_bam_groups "lymph=t_cell,b_cell,nk_cell;myeloid=monocyte" \
  -t 8

# Step 2: Call SVs, tandem repeats, and SNVs on each group independently
sniffcell discover tools run \
  --deconv-dir deconv_out \
  --reference ref.fa \
  --tr-bed atlas/adotto.v2.trgt.bed \
  --sex female \
  --stages sv,tdb \
  --threads 16

# Step 3: Annotate the harmonized variants
sniffcell anno \
  -i sample.bam \
  -v deconv_out/discover/harmonized_variants.tsv \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -o anno_out

Before running discover, validate your environment:

sniffcell-check-discover --stages all

Visualizing Individual SVs

# Minimal — loads all inputs automatically from the anno manifest
sniffcell viz --anno_output anno_out -s sniffles.SV123

# With table exports
sniffcell viz --anno_output anno_out -s sniffles.SV123 --export_tables

# IGV batch screenshot
sniffcell igvviz --anno_output anno_out -s sniffles.SV123

Documentation

Full documentation lives in the GitHub Wiki:

Page Contents
Installation PyPI, conda, Docker, manual tool setup, verification
End-to-End Workflow Step-by-step walkthrough from atlas to HTML report
Find Workflow ctDMR discovery internals and parameter guide
Methods Technical methods for deconv, discover, and anno
Test Examples Practical validation and QA queries

Citation

If you use SniffCell in your research, please cite:

SniffCell: cell-type annotation of somatic structural variants using long-read methylation Yilei Fu et al. (manuscript in preparation)


License

MIT License — see LICENSE for details.

Developed at Baylor College of Medicine by Yilei Fu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sniffcell-0.9.3.tar.gz (217.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sniffcell-0.9.3-py3-none-any.whl (192.9 kB view details)

Uploaded Python 3

File details

Details for the file sniffcell-0.9.3.tar.gz.

File metadata

  • Download URL: sniffcell-0.9.3.tar.gz
  • Upload date:
  • Size: 217.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sniffcell-0.9.3.tar.gz
Algorithm Hash digest
SHA256 a770e94d0fd271e91c9500c7ab46adc2384106b57d517c14562fcf4c9a48ad47
MD5 cc88094774588bfd4af11d84e746bb36
BLAKE2b-256 4b2a15fb6348708d42d2a0054218484652b19863d7210219b99b2bf9a2b9e6c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for sniffcell-0.9.3.tar.gz:

Publisher: python-publish.yml on Fu-Yilei/SniffCell

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sniffcell-0.9.3-py3-none-any.whl.

File metadata

  • Download URL: sniffcell-0.9.3-py3-none-any.whl
  • Upload date:
  • Size: 192.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sniffcell-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ea6e8c917f0bb75d5a18f4905cce34536c78eafd6eeca296847a1cd3272bc8cf
MD5 3e73b7a6ada561573243325749d504a2
BLAKE2b-256 51b50ba352cff95be0f24512b877389238fdf66c653a827307de7b760d3ebf14

See more details on using hashes here.

Provenance

The following attestation bundles were made for sniffcell-0.9.3-py3-none-any.whl:

Publisher: python-publish.yml on Fu-Yilei/SniffCell

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page