A Python tool to find, plot, and export synteny blocks from all-vs-all BLAST.
Project description
DaisyBlast is a CLI tool for detecting and visualizing synteny blocks (collinear, homologous genomic segments) across multiple FASTA inputs. It performs an all-vs-all BLAST search, “shatters” alignments into non-overlapping windows, and groups them into syntenic blocks using a graph-based approach.
The Problem
BLAST is the gold standard for pairwise nucleotide comparison (A ↔ B), but analyzing multiple samples requires a broader view.
DaisyBlast daisy-chains these isolated hits using a Union-Find graph algorithm to enforce transitivity. If A aligns with B, and B aligns with C, DaisyBlast unifies all three into a single Synteny Group. This enables the visualization of conserved structure across n inputs, moving beyond simple pairwise limitations.
Features
-
Automated Pipeline Run BLAST, parse hits, identify synteny groups, and generate plots—all from one command.
-
Graph-Based Grouping Union-Find logic chains collinear hits into multi-sample synteny blocks.
-
Comprehensive Visualizations
- Synteny maps: Linear and Circos-style circular plots.
- Dotplots: Pairwise and combined alignment geometry.
- Coverage summaries: NCBI-style stacked alignments.
-
Robust to Fragmentation Uses a “shattering” algorithm to create clean, non-overlapping windows from complex, overlapping BLAST outputs.
Installation
Prerequisites
- Python ≥ 3.8
- NCBI BLAST+ (
makeblastdbandblastnmust be in your PATH)
Option 1: Install from PyPI (Recommended)
pip install daisyblast
Option 2: Install from source
git clone https://github.com/erinyoung/daisyblast.git
cd daisyblast
pip install .
Verify Installation
daisyblast --help
This should show the help menu.
usage: daisyblast [-h] -i INPUT [INPUT ...] [-o OUTPUT_DIR] [-e EVALUE] [--min_pident MIN_PIDENT] [--min_length MIN_LENGTH] [-n NUM_GROUPS]
DaisyBlast: A tool to find and visualize synteny blocks from a single multi-FASTA file.
options:
-h, --help show this help message and exit
-i INPUT [INPUT ...], --input INPUT [INPUT ...]
One or more input FASTA files (e.g., contig1.fa contig2.fa).
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to save output .bed and .png files. (Default: daisyblast_results)
-e EVALUE, --evalue EVALUE
E-value cutoff for the self-BLAST search. (Default: 1e-10)
--min_pident MIN_PIDENT
Minimum percent identity for a BLAST hit. (Default: 90.0)
--min_length MIN_LENGTH
Minimum alignment length *after* splitting hits. (Default: 200)
-n NUM_GROUPS, --num_groups NUM_GROUPS
Maximum number of groups in final bedfile (Default: 20)
Usage
daisyblast -i data/contig1.fasta data/contig2.fasta -o results_dir
Quick Start
You can test the installation using the sample data provided in the repository:
# Run on included test files
daisyblast -i tests/data/test_1.fasta tests/data/test_2.fasta -o test_results
Output Overview
1. Synteny Maps (Grouped Blocks)
High-level views of conserved regions. Each color corresponds to a unique Synteny Group shared across sequences.
-
Circular Plot: The query sequence forms the outer ring; colored blocks indicate a blast hit shared by two or more input sequences.
-
Linear Map: Synteny blocks plotted along genomic coordinates.
2. Alignment Geometry (Dotplots)
Visualizes raw BLAST hits before grouping. Use these to detect inversions (downward diagonals) or indels (gaps).
-
Combined dotplots: All subjects vs. one query on a single plot.
-
Pairwise dotplots: One panel per sequence pair.
3. Coverage Summaries
NCBI-style stacked bar charts showing alignment depth and scoring.
- Red/Pink: High-scoring hits (>80 bitscore)
- Blue/Black: Lower scoring hits
4. Data Files
| File | Description |
|---|---|
final_groups.txt |
Final synteny group assignments (Group_ID Sequence Start End) |
divided.bed |
Shattered genomic windows used in analysis |
blast_hits.txt |
Raw BLAST format 6 output |
trimmed_blast.tsv |
BLAST hits trimmed to window boundaries |
How It Works
| Step | Task | Reason |
|---|---|---|
| 1 | Rename | All headers are adjusted to ${filename}__${original_header} to ensure unique IDs and prevent collisions when using multiple input files. |
| 2 | BLAST | Performs an all-vs-all blastn search across all input FASTA sequences. |
| 3 | Shatter | Parses BLAST hits and breaks genomes into discrete, non-overlapping windows. Why? If Overlap(A,B) and Overlap(B,C) differ in size, shattering creates a common denominator window to allow clean comparison. |
| 4 | Trim | Crops each BLAST alignment so it fits strictly within its corresponding shattered window. |
| 5 | Group | Applies a Union-Find graph algorithm to chain windows into synteny blocks. Logic: If A ↔ B and B ↔ C, DaisyBlast groups A, B, and C together. |
| 6 | Visualize | Generates synteny maps, dotplots, and coverage summaries using matplotlib and pycirclize. |
AI Attribution
Please note that portions of this codebase were written with the assistance of Google Gemini to accelerate development. The package logo was also AI-generated using Gemini's image creation tools.
Citation
If you use DaisyBlast in your research, please cite:
DaisyBlast: Multi-sample synteny detection via transitive BLAST chaining. GitHub repository: https://github.com/erinyoung/daisyblast
License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daisyblast-0.2.0.tar.gz.
File metadata
- Download URL: daisyblast-0.2.0.tar.gz
- Upload date:
- Size: 269.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be3403b4f94c735630b6a1fe70377de956158d55460ae238e7f0a2c4ec3a3190
|
|
| MD5 |
fb7a4f458083b9c42eb984913b39e45a
|
|
| BLAKE2b-256 |
03972e6a68a25f6d39ed7581a2dffe62fe3562ad40f3b5e3ba7236b4cd5303b3
|
File details
Details for the file daisyblast-0.2.0-py3-none-any.whl.
File metadata
- Download URL: daisyblast-0.2.0-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26254f819a81003c1a3c840fbce0d5ac94731f595e68e686981a834fba3e9a9e
|
|
| MD5 |
4173cea5eabece6cb16a377186dfa1a4
|
|
| BLAKE2b-256 |
f129002a5bf1f949402c3c01ac1fd187163277cfc31943de30db6f1a5a0f0e28
|