Skip to main content

SniffCell: Annotate SVs cell type based on CpG methylation

Project description

SniffCell

PyPI version Install Docs Issues

SniffCell annotates structural variants (SVs) using long-read methylation evidence and cell-type-specific ctDMR signals.

Installation

pip install sniffcell          # from PyPI
pip install -e .               # local development

Requires Python >=3.10.

Commands

sniffcell {find, deconv, anno, svanno, dmsv, viz, igvviz, report}

Typical Workflow

  1. Call ctDMRs from an atlas with find.
  2. Annotate SVs with ctDMR evidence using anno.
  3. Re-run SV assignment from saved read tables with svanno (optional).
  4. Generate an HTML review report with report.
  5. Visualize individual SVs with viz or igvviz.
  6. Test differential methylation near SVs with dmsv (optional).
  7. Deconvolve cell-type composition from any BAM with deconv (optional).

find: Call ctDMRs From an Atlas

Loads an atlas methylation matrix and calls cell-type-specific differentially methylated regions (ctDMRs).

sniffcell find \
  -n atlas/all_celltypes_blocks.npy \
  -i atlas/all_celltypes_blocks.index.gz \
  -cf atlas/index_to_major_celltypes.json \
  -m atlas/all_celltypes.txt \
  -ck pbmc \
  -o pbmc_ctdmr.tsv \
  --diff_threshold 0.40 \
  --min_rows 2 \
  --min_cpgs 3 \
  --max_gap_bp 500

If -n/-i/-cf/-m are omitted, paths default to ./atlas/... in your working directory.

-ck/--celltypes_keys selects a top-level JSON key mapping {group_name: [sample_id, ...]}.

Outputs:

  • <output> — annotation-ready ctDMR TSV
  • <output>.igv.bed — IGV BED9 companion file

anno: Annotate SVs With ctDMRs

Classifies reads per ctDMR region, then assigns cell-type codes to each SV.

sniffcell anno \
  -i sample.bam \
  -v sample.vcf.gz \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -o anno_out \
  -w 10000 \
  --breakpoint_exclusion_frac 0.1 \
  -t 8 \
  --evidence_mode all_rows \
  --min_overlap_pct 0.0 \
  --min_agreement_pct 1.0

Key options:

  • --evidence_mode {all_rows,per_read} — how ctDMR evidence is aggregated (default: all_rows)
  • --breakpoint_exclusion_frac — excludes ctDMRs within ±frac × SVLEN of the breakpoint (default: 0.0)
  • --min_overlap_pct / --min_agreement_pct — filtering thresholds

assigned_code is suppressed when has_hard_conflict=True.

Outputs in <output>/:

  • reads_classification.tsv
  • blocks_classification.tsv
  • sv_assignment.tsv / sv_assignment_readable.tsv / sv_assignment_readable_long.tsv
  • anno_run_manifest.json

svanno: Recompute SV Assignments

Re-runs only the SV assignment step from an existing reads_classification.tsv, useful for tuning thresholds without re-processing the BAM.

sniffcell svanno \
  -v sample.vcf.gz \
  -i anno_out/reads_classification.tsv \
  -w 10000 \
  --breakpoint_exclusion_frac 0.1 \
  --evidence_mode all_rows \
  --min_overlap_pct 0.0 \
  --min_agreement_pct 1.0 \
  -o anno_out

deconv: Cell-Type Deconvolution

Assigns every read in a BAM a cell-type code using ctDMR methylation patterns, then produces per-read, per-group, and whole-sample summaries.

sniffcell deconv \
  -i sample.bam \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -o deconv_out \
  -t 8 \
  --read_assignment_mode closest_reference_mean

Key options:

  • --read_assignment_mode {closest_reference_mean,kmeans} — assignment algorithm (default: closest_reference_mean)
  • --split_bam_groups — after deconvolution, split reads into per-group BAMs. Use ; between groups and , between labels within a group. Named splits use =. Example: lymph=t_cell,b_cell,nk_cell;myeloid=monocyte
  • --resume — skip ctDMR classification and reload existing TSVs; useful for re-splitting without reprocessing

Outputs in <output>/:

  • deconv_reads_classification.tsv — one row per (read × ctDMR)
  • deconv_blocks_classification.tsv — per-ctDMR block summary
  • deconv_read_summary.tsv — one row per read with majority cell type and linked celltypes
  • deconv_summary.tsv — whole-sample summary in all_rows and per_read modes
  • deconv_reads_by_group/ — per-group read tables (split by best_group)
  • deconv_requested_group_splits/ — user-defined BAM and TSV splits (when --split_bam_groups is used)
  • deconv_run_manifest.json

viz: Visualize One SV

Renders a figure (PNG or PDF) for a single SV with read-level methylation and ctDMR context.

# Minimal — loads inputs from anno manifest
sniffcell viz \
  --anno_output anno_out \
  -s sniffles.SV123

# Manual mode
sniffcell viz \
  -i sample.bam \
  -v sample.vcf.gz \
  -s sniffles.SV123 \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -a anno_out/reads_classification.tsv \
  -o figures/sniffles.SV123 \
  -f png

Notable options:

  • --indel_min_bp — overlay read-level indels ≥ N bp on reads (default: 40; set to 0 to disable)
  • --linked_ctdmr_mode {distal,extend,strict} — controls how off-window winning ctDMRs are displayed (default: distal)
  • --export_tables — also write .summary.tsv, .supporting_reads_assignment.tsv, and .supporting_reads_ctdmr_methylation.tsv

igvviz: IGV Screenshots for One SV

Runs IGV batch mode and produces snapshots per BAM, with reads tagged and grouped by phase.

sniffcell igvviz \
  -i fans_a.bam fans_b.bam fans_c.bam \
  -v sample.vcf.gz \
  -s sniffles.SV123 \
  -r ref.fa \
  -b pbmc_ctdmr.tsv \
  -w 10000 \
  -o out/igvviz

Notable options:

  • --anno_output — load inputs from anno manifest (manifest-driven mode)
  • --igv_cmd — path to IGV executable (default: igv.sh)
  • --snapshot_width/--snapshot_height — snapshot dimensions (default: 3600×1600)
  • --batch_only — write batch script only, don't run IGV

report: HTML Review Report

Filters high-confidence SVs from anno output and builds an interactive HTML report.

# Basic report
sniffcell report \
  --anno_output anno_out \
  --min_overlap_pct 0.8 \
  --min_majority_pct 1.0

# With viz figures and IGV screenshots
sniffcell report \
  --anno_output anno_out \
  --with_figures \
  --with_igvviz \
  --igv_bams fans1.bam fans2.bam fans3.bam \
  --figure_threads 4

# With igv-reports alternate page (requires: pip install igv-reports)
sniffcell report \
  --anno_output anno_out \
  --with_igvreport \
  --igv_bams fans1.bam fans2.bam

Default SV filters:

  • assigned_code must be non-empty
  • linked_celltypes must be non-empty
  • has_hard_conflict must be False
  • --min_overlap_pct0.8 and --min_majority_pct1.0

Outputs under <anno_output>/report/:

  • index.html — interactive report with genome-wide plots and per-SV panels
  • high_confidence_sv.tsv
  • figures/ — viz panels (when --with_figures)
  • igvviz/ — IGV screenshots (when --with_igvviz)
  • igvreport/index.html — alternate IGV.js page (when --with_igvreport)
  • report_manifest.json

Review labels (Real / Not real / Undecided) auto-save to browser localStorage and persist across sessions.


dmsv: Differential Methylation Around SVs

Tests for methylation differences between SV-supporting and non-supporting reads near each SV.

sniffcell dmsv \
  -i sample.bam \
  -v sample.vcf.gz \
  -r ref.fa \
  -o dmsv_out \
  -m 3 \
  -f 1000 \
  -c 5 \
  -t 8

Outputs:

  • dmsv_out/significant_SVs.tsv
  • dmsv_out/sv_details/<sv_id>.tsv.gz

Wiki

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sniffcell-0.7.0.tar.gz (130.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sniffcell-0.7.0-py3-none-any.whl (120.9 kB view details)

Uploaded Python 3

File details

Details for the file sniffcell-0.7.0.tar.gz.

File metadata

  • Download URL: sniffcell-0.7.0.tar.gz
  • Upload date:
  • Size: 130.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sniffcell-0.7.0.tar.gz
Algorithm Hash digest
SHA256 260abc5228e1184a25b8632ab359b983e71e406061ea523072ad7de65bf82e00
MD5 4d7a32e0c733221031bd854ea1137d9f
BLAKE2b-256 f1a6eace3d79334edb9c455611d33dcd31204f450ebd2ba5977aa15d8e3533c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for sniffcell-0.7.0.tar.gz:

Publisher: python-publish.yml on Fu-Yilei/SniffCell

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sniffcell-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: sniffcell-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 120.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sniffcell-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31cfbb10568f674a5d42e63977caa8f2486956d055171982908aa6f299aed68d
MD5 f2af8d8cab6e1ba42918584e8a3e37d6
BLAKE2b-256 d64cb6b4414f3d36bba75e08f25f0e4509351e2fa9cf0743505e5b7e77303a40

See more details on using hashes here.

Provenance

The following attestation bundles were made for sniffcell-0.7.0-py3-none-any.whl:

Publisher: python-publish.yml on Fu-Yilei/SniffCell

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page