SniffCell: Annotate SVs cell type based on CpG methylation
Project description
SniffCell - annotate structural variants with methylation-derived cell-type signals
SniffCell analyzes long-read methylation around SVs and provides cell-type-aware annotations.
Version
Current package version in code: v0.6.0.
Install
pip install sniffcell
For local development:
pip install -e .
CLI commands
sniffcell {find,deconv,anno,svanno,dmsv,viz}
Command status in the current code:
find: implemented.anno: implemented.svanno: implemented.dmsv: implemented.viz: implemented.deconv: placeholder stub (currently prints args only).
Input assumptions
- BAM: long-read BAM with modified base tags;
HPhaplotype tag is optional. - Reference: FASTA indexed for region fetches.
- VCF:
INSandDELrecords are used. - VCF INFO field
RNAMESis used for supporting reads unless overridden by--kanpig_read_names. - VCF INFO fields
STDEV_POS,STDEV_LEN,SVLENare used to deriveref_startandref_endwindows. - BED for
anno: one tab-delimited hierarchical DMR file fromsniffcell findwith at leastchr,start,end,best_group,best_dir.
find: call hierarchical ctDMRs from atlas matrices
Finds cell-type-specific DMR regions from an explicit hierarchy schema in atlas/index_to_major_celltypes.json, then writes one annotation-ready BED/TSV.
Hierarchy schema:
- Add a top-level
__hierarchy__object. - Define each hierarchy key with
source_keyand optionalchildren. - Each child can point to another
source_keyand optionalgroups.
Example:
"__hierarchy__": {
"pbmc-lymphocytes": {
"source_key": "pbmc-lymphocytes",
"children": {
"lymphocytes": {
"source_key": "pbmc",
"groups": ["T-cell", "NK-cell", "B-cell"]
}
}
}
}
Example:
sniffcell find \
-n atlas/all_celltypes_blocks.npy \
-i atlas/all_celltypes_blocks.index.gz \
-cf atlas/index_to_major_celltypes.json \
-m atlas/all_celltypes.txt \
-ck pbmc-lymphocytes \
-o pbmc_hierarchy.tsv \
--diff_threshold 0.40 \
--min_rows 2 \
--min_cpgs 3 \
--max_gap_bp 500
Outputs:
<output>: annotation-ready hierarchical BED/TSV forsniffcell anno.<output>.igv.bed: companion IGV BED9 (headerless, IGV-ready).
Key columns in <output> include:
best_group,best_dircode_order(global leaf schema)best_group_leaves,other_group_leaveshierarchy_level,hierarchy_path,hierarchy_source_key- per-node means (
mean_<group>)
anno: annotate SVs with one hierarchical BED file
anno processes DMR regions near SVs, classifies reads per region, then summarizes per-SV assignment.
Basic example:
sniffcell anno \
-i sample.bam \
-v sample.vcf.gz \
-r ref.fa \
-b pbmc_hierarchy.tsv \
-o anno_out \
-w 10000 \
-t 8
anno outputs:
reads_classification.tsv: per-read region-level assignments.blocks_classification.tsv: per-region methylation summaries.sv_assignment.tsv: SV-level assignment summary (produced by runningsvannointernally at end ofanno).sv_assignment_readable.tsv: readable SV summary focused on classified cell types per SV.sv_assignment_readable_long.tsv: long-formatSV x celltypetable with counts/fractions.anno_run_manifest.json: run log/manifest with input paths and outputs (used bysniffcell viz --anno_output).
SV assignment options (available in both anno and svanno):
--evidence_mode {all_rows,per_read}: how ctDMR evidence is aggregated for each SV.--min_overlap_pct: minimum overlap fraction required to keepassigned_code.--min_agreement_pct: minimum majority agreement required to keepassigned_code.
Defaults are strict:
--evidence_mode all_rows(uses every supporting-read x ctDMR row; no per-read vote collapse)--min_agreement_pct 1.0(any conflicting code makesassigned_codeempty / unreliable)
Conflict rule:
assigned_codeis forced empty when evidence has a hard conflict (has_hard_conflict=True), i.e. code constraints intersect to an empty set (for example1110with0001in the same schema).
How hierarchical codes are handled
- One BED/TSV from
findis loaded. - Regions are filtered by SV proximity with
--window. - Every kept region is processed independently to generate per-read codes.
code_orderdefines the shared leaf-level bit schema.best_group_leavesdefines which bits are set for the target cluster in each DMR.- During SV assignment, reads are linked to SVs by chromosome-aware interval matching (
--window), then evidence is aggregated by--evidence_mode(all_rowsby default;per_readis optional).
svanno: recompute SV-level assignment from precomputed read classifications
Use when you already have reads_classification.tsv and want to regenerate SV summaries.
Example:
sniffcell svanno \
-v sample.vcf.gz \
-i anno_out/reads_classification.tsv \
-w 10000 \
--evidence_mode all_rows \
--min_agreement_pct 1.0 \
-o anno_out
Output:
sv_assignment.tsvsv_assignment_readable.tsvsv_assignment_readable_long.tsv
Readable summary columns include:
id,sv_chr,sv_pos,sv_len,vafn_supporting,n_overlapped,overlap_pct,majority_pctclassified_celltypes,classified_celltype_countclassified_celltype_counts,classified_celltype_fractions,classification_summaryis_multi_celltype_link
sv_assignment.tsv also includes:
has_hard_conflict: whether constraints are mutually incompatible.intersection_code: bitwise intersection of observed constraints in the dominant schema.
Long-format columns include:
id,sv_chr,sv_pos,sv_lencelltype,rank,supporting_read_count,supporting_read_fractionn_supporting,n_overlapped,overlap_pct
viz: visualize one SV with reads and ctDMR overlap
Generate a figure (PNG/PDF) centered on one SV ID, showing:
- all reads in
SV +/- window(supporting reads highlighted), - SV interval,
- overlapping ctDMRs from a
findBED/TSV. - all cell-type methylation values on those ctDMRs from
mean_*columns (heatmap panel).
Simple example (from an anno output folder):
sniffcell viz \
--anno_output anno_out \
-s sniffles.SV123 \
-o anno_out/sniffles.SV123
Outputs:
- Default output:
anno_out/sniffles.SV123.png(or.pdf) - Add
--export_tablesif you also want TSV outputs (.summary.tsv,.supporting_reads_assignment.tsv,.supporting_reads_ctdmr_methylation.tsv)
dmsv: test differential methylation around SVs
Computes per-CpG statistics between supporting and non-supporting reads near each SV.
Example:
sniffcell dmsv \
-i sample.bam \
-v sample.vcf.gz \
-r ref.fa \
-o dmsv_out \
-m 3 \
-f 1000 \
-c 5 \
-t 8
Outputs:
dmsv_out/significant_SVs.tsv: per-SV summary including significance counts and effect summaries.dmsv_out/sv_details/<sv_id>.tsv.gz: per-CpG stats table for each SV.
Current implementation note:
dmsvparses--test_typebut the current backend path uses consistency-aware MWU screening instatistical_test_around_sv.py.
deconv
deconv CLI arguments exist but implementation is currently a placeholder (deconv_main only prints arguments).
Practical example
sniffcell anno \
-i data/sample.bam \
-v data/sample.vcf.gz \
-b dmrs/pbmc_hierarchy.tsv \
-o results/anno.w10000 \
-r refs/GRCh38.fa \
-w 10000 \
-t 8
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sniffcell-0.6.0.tar.gz.
File metadata
- Download URL: sniffcell-0.6.0.tar.gz
- Upload date:
- Size: 654.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5abc02752903aa8a618d60ac639616f8e50f9e56269670c3189d2ea9cab3272
|
|
| MD5 |
88995611a5e0c5db332d1b6155219ece
|
|
| BLAKE2b-256 |
253a18c4262ba3ee82f68a219ba5289ee883d273d3d48b3458291e51abacaced
|
Provenance
The following attestation bundles were made for sniffcell-0.6.0.tar.gz:
Publisher:
python-publish.yml on Fu-Yilei/SniffCell
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sniffcell-0.6.0.tar.gz -
Subject digest:
b5abc02752903aa8a618d60ac639616f8e50f9e56269670c3189d2ea9cab3272 - Sigstore transparency entry: 962982832
- Sigstore integration time:
-
Permalink:
Fu-Yilei/SniffCell@5cd2bcc0fda60fd9de27e05353f799dfccf25faa -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/Fu-Yilei
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5cd2bcc0fda60fd9de27e05353f799dfccf25faa -
Trigger Event:
release
-
Statement type:
File details
Details for the file sniffcell-0.6.0-py3-none-any.whl.
File metadata
- Download URL: sniffcell-0.6.0-py3-none-any.whl
- Upload date:
- Size: 94.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8fa55537ec945328daf06b592674d13c95fa9f82cf4867b14552ca5cb6f3056
|
|
| MD5 |
1f42787ef3667dad975e2fe797e464b7
|
|
| BLAKE2b-256 |
95e7232f19fcf1bb97a2f593dee63bf2ed70449aee09f67472a96784231528bf
|
Provenance
The following attestation bundles were made for sniffcell-0.6.0-py3-none-any.whl:
Publisher:
python-publish.yml on Fu-Yilei/SniffCell
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sniffcell-0.6.0-py3-none-any.whl -
Subject digest:
c8fa55537ec945328daf06b592674d13c95fa9f82cf4867b14552ca5cb6f3056 - Sigstore transparency entry: 962982842
- Sigstore integration time:
-
Permalink:
Fu-Yilei/SniffCell@5cd2bcc0fda60fd9de27e05353f799dfccf25faa -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/Fu-Yilei
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5cd2bcc0fda60fd9de27e05353f799dfccf25faa -
Trigger Event:
release
-
Statement type: