Skip to main content

A Python tool to find, plot, and export synteny blocks from all-vs-all BLAST.

Project description

DaisyBlast Logo

DaisyBlast

Multi-sample synteny detection via transitive BLAST chaining.

PyPI version License

DaisyBlast is a CLI tool for detecting and visualizing synteny blocks (collinear, homologous genomic segments) across multiple FASTA inputs. It performs an all-vs-all BLAST search, “shatters” alignments into non-overlapping windows, and groups them into syntenic blocks using a graph-based approach.

The Problem

BLAST is the gold standard for pairwise nucleotide comparison (A ↔ B), but analyzing multiple samples requires a broader view.

DaisyBlast daisy-chains these isolated hits using a Union-Find graph algorithm to enforce transitivity. If A aligns with B, and B aligns with C, DaisyBlast unifies all three into a single Synteny Group. This enables the visualization of conserved structure across n inputs, moving beyond simple pairwise limitations.

Features

  • Automated Pipeline Run BLAST, parse hits, identify synteny groups, and generate plots—all from one command.

  • Graph-Based Grouping Union-Find logic chains collinear hits into multi-sample synteny blocks.

  • Comprehensive Visualizations

    • Synteny maps: Linear and Circos-style circular plots.
    • Dotplots: Pairwise and combined alignment geometry.
    • Coverage summaries: NCBI-style stacked alignments.
  • Robust to Fragmentation Uses a “shattering” algorithm to create clean, non-overlapping windows from complex, overlapping BLAST outputs.


Installation

Prerequisites

  • Python ≥ 3.8
  • NCBI BLAST+ (makeblastdb and blastn must be in your PATH)

Option 1: Install from PyPI (Recommended)

pip install daisyblast

Option 2: Install from source

git clone https://github.com/erinyoung/daisyblast.git
cd daisyblast
pip install .

Verify Installation

daisyblast --help

This should show the help menu.

usage: daisyblast [-h] -i INPUT [INPUT ...] [-o OUTPUT_DIR] [-e EVALUE] [--min_pident MIN_PIDENT] [--min_length MIN_LENGTH] [-n NUM_GROUPS]

DaisyBlast: A tool to find and visualize synteny blocks from a single multi-FASTA file.

options:
  -h, --help            show this help message and exit
  -i INPUT [INPUT ...], --input INPUT [INPUT ...]
                        One or more input FASTA files (e.g., contig1.fa contig2.fa).
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Directory to save output .bed and .png files. (Default: daisyblast_results)
  -e EVALUE, --evalue EVALUE
                        E-value cutoff for the self-BLAST search. (Default: 1e-10)
  --min_pident MIN_PIDENT
                        Minimum percent identity for a BLAST hit. (Default: 90.0)
  --min_length MIN_LENGTH
                        Minimum alignment length *after* splitting hits. (Default: 200)
  -n NUM_GROUPS, --num_groups NUM_GROUPS
                        Maximum number of groups in final bedfile (Default: 20)

Usage

daisyblast -i data/contig1.fasta data/contig2.fasta -o results_dir

Quick Start

You can test the installation using the sample data provided in the repository:

# Run on included test files
daisyblast -i tests/data/test_1.fasta tests/data/test_2.fasta -o test_results

Output Overview

1. Synteny Maps (Grouped Blocks)

High-level views of conserved regions. Each color corresponds to a unique Synteny Group shared across sequences.

  • Circular Plot: The query sequence forms the outer ring; colored blocks indicate a blast hit shared by two or more input sequences.

    Circular Image with Synteny Groups

  • Linear Map: Synteny blocks plotted along genomic coordinates.

    Linear Image with Synteny Groups

2. Alignment Geometry (Dotplots)

Visualizes raw BLAST hits before grouping. Use these to detect inversions (downward diagonals) or indels (gaps).

  • Combined dotplots: All subjects vs. one query on a single plot. Combined Dot Plot

  • Pairwise dotplots: One panel per sequence pair. Pairwise Dot Plot

3. Coverage Summaries

NCBI-style stacked bar charts showing alignment depth and scoring.

  • Red/Pink: High-scoring hits (>80 bitscore)
  • Blue/Black: Lower scoring hits

Combined Blast Hits

4. Data Files

File Description
final_groups.txt Final synteny group assignments (Group_ID Sequence Start End)
divided.bed Shattered genomic windows used in analysis
blast_hits.txt Raw BLAST format 6 output
trimmed_blast.tsv BLAST hits trimmed to window boundaries

How It Works

Step Task Reason
1 Rename All headers are adjusted to ${filename}__${original_header} to ensure unique IDs and prevent collisions when using multiple input files.
2 BLAST Performs an all-vs-all blastn search across all input FASTA sequences.
3 Shatter Parses BLAST hits and breaks genomes into discrete, non-overlapping windows.
Why? If Overlap(A,B) and Overlap(B,C) differ in size, shattering creates a common denominator window to allow clean comparison.
4 Trim Crops each BLAST alignment so it fits strictly within its corresponding shattered window.
5 Group Applies a Union-Find graph algorithm to chain windows into synteny blocks.
Logic: If A ↔ B and B ↔ C, DaisyBlast groups A, B, and C together.
6 Visualize Generates synteny maps, dotplots, and coverage summaries using matplotlib and pycirclize.

AI Attribution

Please note that portions of this codebase were written with the assistance of Google Gemini to accelerate development. The package logo was also AI-generated using Gemini's image creation tools.

Citation

If you use DaisyBlast in your research, please cite:

DaisyBlast: Multi-sample synteny detection via transitive BLAST chaining. GitHub repository: https://github.com/erinyoung/daisyblast

License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daisyblast-0.2.0.tar.gz (269.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daisyblast-0.2.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file daisyblast-0.2.0.tar.gz.

File metadata

  • Download URL: daisyblast-0.2.0.tar.gz
  • Upload date:
  • Size: 269.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for daisyblast-0.2.0.tar.gz
Algorithm Hash digest
SHA256 be3403b4f94c735630b6a1fe70377de956158d55460ae238e7f0a2c4ec3a3190
MD5 fb7a4f458083b9c42eb984913b39e45a
BLAKE2b-256 03972e6a68a25f6d39ed7581a2dffe62fe3562ad40f3b5e3ba7236b4cd5303b3

See more details on using hashes here.

File details

Details for the file daisyblast-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: daisyblast-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for daisyblast-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26254f819a81003c1a3c840fbce0d5ac94731f595e68e686981a834fba3e9a9e
MD5 4173cea5eabece6cb16a377186dfa1a4
BLAKE2b-256 f129002a5bf1f949402c3c01ac1fd187163277cfc31943de30db6f1a5a0f0e28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page