delfies is a tool for the detection of DNA Elimination breakpoints
Project description
delfies is a tool that identifies genomic locations where double-strand
breaks have occurred followed by telomere addition. It was initially designed
and validated for studying the process of Programmed DNA Elimination in
nematodes, but should work for other clades and applications too.
For details/to credit the tool, please see/cite the associated paper:
Letcher, B. and Delattre, M. (2025). delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition. Journal of Open Source Software, 10(105), 7385, https://doi.org/10.21105/joss.07385
Getting started
delfies takes as input a genome fasta (gzipped supported) and an indexed SAM/BAM of
sequencing reads aligned to the genome.
delfies --help
samtools index <aligned_reads>.bam
delfies <genome>.fa.gz <aligned_reads>.bam <output_dir>
cat <output_dir>/breakpoint_locations.bed
For how to obtain a suitable SAM/BAM, see input data, and for
downloading a real genome and BAMs for a test run of delfies, see test run.
Table of Contents
Installation
Using pip (or equivalent - uv, etc.):
# Install latest release from PyPI
pip install delfies
# Or install a specific release from PyPI:
pip install delfies==0.11.0
# Or clone and install tip of main
git clone https://github.com/bricoletc/delfies/
pip install ./delfies
Input data
Sequencing technologies
delfies is designed to work with both Illumina short reads and ONT or PacBio
long reads. Long reads are better for finding breakpoints in more repetitive
regions of the genome. A high fraction of sequenced bases with a quality >Q20
is desirable (e.g. >70%). We found delfies worked on recent data from all three
sequencing technologies: see test run below.
Aligners
To produce a SAM/BAM with which you can find breakpoints, you need to use a read
aligner that reports soft clips (parts of a reads that are not aligned to the
reference). Both bowtie2 (in --local mode) and minimap2 (by default) do this.
Use minimap2 for long reads (>300bp), with the appropriate preset (e.g. -x map-ont
for Nanopore data).
Test run with real data
We provide a processed subset of publicly-available data here: https://doi.org/10.5281/zenodo.14101797.
The data consist of a 2kbp region of the assembled genome of Oscheius onirici
and three alignment BAMs from sequencing data produced using Illumina, ONT and
PacBio. The data were aligned to the 2kbp region using minimap2. See the
Zenodo link for details on the sequencing data (read lengths, error rates) and
public links to the raw data.
You can run delfies on the inputs in this archive to make sure it is properly
installed and produces the expected outputs:
wget https://zenodo.org/records/14282333/files/delfies_zenodo_test_data.tar.gz
tar xf delfies_zenodo_test_data.tar.gz
# Run delfies; for example, having defined genome, bam and odirname variables:
delfies --threads 16 \
--telo_forward_seq TTAGGC \
--breakpoint_type all \
--min_mapq 20 \
--min_supporting_reads 6 \
${genome} ${bam} ${odirname}
# Compare with the expected outputs:
find delfies_zenodo_test_data -name "*breakpoint_locations.bed" | xargs cat
User Manual
CLI options
delfies --help
- Do use the
--threadsoption if you have multiple cores/CPUs available. - [Breakpoints]
- There are two types of breakpoints: see detailed docs.
- Nearby breakpoints can be clustered together to account for variability in breakpoint location (
--clustering_threshold).
- [Region selection]: You can select a specific region to focus on, specified as a string or as a BED file.
- [Telomeres]
- Specify the telomere sequence for your organism using
--telo_forward_seq. If you're unsure, we recommend the tool telomeric-identifier for finding out. - By default,
delfiesdiscards breakpoints occurring inside telomere arrays, as they in theory correspond to false positives (cutting + telomere addition at existing telomeres). You can keep these breakpoints with--keep_telomeric_breakpoints.
- Specify the telomere sequence for your organism using
- [Aligned reads]
- To analyse confidently-aligned reads only, you can filter reads by MAPQ (
--min_mapq) and by bitwise flag (--read_filter_flag). - You can tolerate more or less mutations in the assembly telomeres (and in the sequencing reads) using
--telo_max_edit_distanceand--telo_array_size.
- To analyse confidently-aligned reads only, you can filter reads by MAPQ (
Outputs
The two main outputs of delfies are:
breakpoint_locations.bed: a BED-formatted file containing the location of identified elimination breakpoints. The six first columns are the standard BED columns; the seventh corresponds to '--sample_name' provided at CLI.breakpoint_sequences.fasta: a FASTA-formatted file containing the sequences of identified elimination breakpoints
Validating breakpoints
We highly recommend visualising your results! E.g., by loading your input
fasta and BAM and output delfies' output breakpoint_locations.bed in
IGV.
Confident/true breakpoints will typically have:
- Good read support. Note that breakpoints are ordered by read support in the
delfiesoutput filebreakpoint_locations.bed, and you can require a minimum number of supporting reads using the CLI option--min_supporting_reads. - A difference in read coverage before and after the breakpoint. The nature of this difference depends on the ratio between cells with and without the breakpoint. As an example, in organisms that eliminate parts of their genome in the soma, if most sequenced cells are from the soma, expect more reads before the breakpoint than after it ('before' and 'after' defined relative to the reported breakpoint strand).
Ultimately though, only biological experiments can truly validate identified breakpoints.
Applications
- The fasta output enables looking for sequence motifs that occur at breakpoints, e.g. using MEME.
- The BED output enables classifying a genome into retained and eliminated regions. The 'strand' of breakpoints is especially useful for this: see detailed docs.
- The BED output also enables assembling past somatic telomeres: for how to do this, see detailed docs.
Detailed documentation
For more details on delfies, including outputs and applications, see detailed_docs.
Contributing
Contributions always welcome!
Please see CONTRIBUTING.md for how (reporting issues, requesting
features, contributing code). This document includes instructions on how to run
delfies' unit and functional tests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file delfies-0.11.0.tar.gz.
File metadata
- Download URL: delfies-0.11.0.tar.gz
- Upload date:
- Size: 150.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
569c99d7632e52c8174faab919724f6624d41b9c759f98c10c179e700bdd8d5f
|
|
| MD5 |
7090bb12fb03c11f9ee330b7aed985e8
|
|
| BLAKE2b-256 |
79ee6889561b4c773404ab1c3ffedf2ef9de8d549eb7084852ebe7f0069440d9
|
Provenance
The following attestation bundles were made for delfies-0.11.0.tar.gz:
Publisher:
release_pypi.yml on bricoletc/delfies
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
delfies-0.11.0.tar.gz -
Subject digest:
569c99d7632e52c8174faab919724f6624d41b9c759f98c10c179e700bdd8d5f - Sigstore transparency entry: 775092717
- Sigstore integration time:
-
Permalink:
bricoletc/delfies@956291dda5b45ea570e1a45838ddff8cde19713f -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/bricoletc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release_pypi.yml@956291dda5b45ea570e1a45838ddff8cde19713f -
Trigger Event:
push
-
Statement type:
File details
Details for the file delfies-0.11.0-py3-none-any.whl.
File metadata
- Download URL: delfies-0.11.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5218460990bb04a212f696a543170f75afbc55164987c5de4fdb8a122f04e6a
|
|
| MD5 |
6e8f504881879c2cafa11b67ff7a4e5e
|
|
| BLAKE2b-256 |
9cea9f632b4868540539753e886631947049045f23e8ef7f9731efdfbd33254f
|
Provenance
The following attestation bundles were made for delfies-0.11.0-py3-none-any.whl:
Publisher:
release_pypi.yml on bricoletc/delfies
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
delfies-0.11.0-py3-none-any.whl -
Subject digest:
d5218460990bb04a212f696a543170f75afbc55164987c5de4fdb8a122f04e6a - Sigstore transparency entry: 775092718
- Sigstore integration time:
-
Permalink:
bricoletc/delfies@956291dda5b45ea570e1a45838ddff8cde19713f -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/bricoletc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release_pypi.yml@956291dda5b45ea570e1a45838ddff8cde19713f -
Trigger Event:
push
-
Statement type: