Fast putative outbreak cluster and infection chain detection using SNPs.

These details have not been verified by PyPI

Project links

Project description

breakfast - FAST outBREAK detection and sequence clustering

breakfast is a simple and fast script developed for clustering SARS-CoV-2 genomes using precalculated sequence features (e.g. nucleotide substitutions) from covSonar or Nextclade.

This project is under development and in experimental stage

Installation

Installation using conda/mamba

Breakfast is available in bioconda. You can install it using either the conda command, or if you've installed mamba you can use that:

$ conda install -c bioconda breakfast
# or
$ mamba install -c bioconda breakfast

Installation using pip

Conda is available from PyPI and can be installed using pip:

$ pip install breakfast

Example Command Line Usage

Simple test run

breakfast \
   --input-file breakfast/test/testfile.tsv  \
   --max-dist 1 \
   --outdir test-run/

You will find your results in test-run/cluster.tsv, which should be identical to breakfast/test/expected_clusters_dist1.tsv

Using breakfast with input from covsonar

breakfast uses pre-calculated sequence features (= mutations) as input rather than raw sequences. These features can be calculated with several different programs, but the one we mainly use is covsonar. It can be used to maintain a database of mutations for a large number of sequences, which can then be easily queried.

conda activate sonar
covsonar/sonar.py add -f genomes.fasta --db mydb --cpus 8
covsonar/sonar.py match --tsv --db mydb > genomic_profiles.tsv

Clustering with a maximum SNP-distance of 1 and excluding clusters below a size of 5 sequences:

breakfast \
   --input-file genomic_profiles.tsv \
   --max-dist 1 \
   --min-cluster-size 5 \
   --outdir covsonar-breakfast-results/

Using breakfast with input from Nextclade

An alternative to covsonar that is commonly used is Nextclade CLI.

conda install -c bioconda nextclade  # If nextclade isn't already installed
nextclade dataset get --name 'sars-cov-2' --output-dir 'data/sars-cov-2'
nextclade \
   --in-order \
   --input-fasta genomes.fasta \
   --input-dataset data/sars-cov-2 \
   --output-tsv output/nextclade.tsv \
   --output-tree output/nextclade.auspice.json \
   --output-dir output/ \
   --output-basename nextclade

Alternatively, you can also use Nextclade Web to process your fasta and export the genomic profile as "nextclade.tsv".

Clustering with a maximum SNP-distance of 1 and excluding clusters below a size of 5 sequences. Since the input tsv of Nextclade looks a little different from the covSonar tsv, you need to specify the additional parameters --id-col, --clust-col and --sep2 for identifying the correct columns.

breakfast \
   --input-file output/nextclade.tsv \
   --max-dist 1 \
   --min-cluster-size 5 \
   --id-col "seqName" \
   --clust-col "substitutions" \
   --sep2 "," \
   --outdir nextclade-breakfast-results/

Sequence feature formats

Typical input data to breakfast looks something like the following table (example from covsonar with unnecessary columns removed):

accession	dna_profile
example1	C241T T606C C913T C3037T del:11288:9 C13515T
example2	C241T T606C del:1000:10
example3	C241T T606C del:1001:20

breakfast has parameters to allow the user to ignore deletions (--skip-del), insertions (--skip-ins) or mutations at the end of the sequences (which can sometimes be error-prone) when calculating the distance between sequences. To be able to do this, we need to know what kind of input is being provided (using the --var-type option), and then parse the mutations themselves. Since the format of how mutations are represented by upstream programs differs, we have implemented program-specific parsers for covsonar DNA and AA, as well as nextclade DNA and AA. Examples are shown below, in case you want to use some other program as input to breakfast you can see if the format matches one of the existing feature formats. As a fallback, you can use the "raw" format, which disables parsing and does not allow you to use breakfast's indel skipping or trimming features.

covsonar

mutation type	DNA (`covsonar_dna`)	AA (`covsonar_aa`)
substitution	C241T	S:N501Y
deletion	del:11288:9	ORF1:del:12:7
insertion	C241TAT	N:A34AK

Nextclade

mutation type	DNA (`nextclade_dna`)	AA (`nextclade_aa`)
substitution	C241T	S:N501Y
deletion	11288-11297 or 22492	S:V70-
insertion	273:CTTCGA	(not provided)

Raw

Features will not be parsed. Skipping inserts (--skip-ins) and/or deletions (--skip-del) and the trimming options are not supported with this feature type.

Parameter description

Parameter	Type	Required	Default	Description
--input-file	String	✅	'genomic_profiles.tsv.gz'	Path of the input file (in tsv format)
--max-dist	Integer		1	Two sequences will be grouped together, if their pairwise edit distance does not exceed this threshold
--min-cluster-size	Integer		2	Minimum number of sequences a cluster needs to include to be defined in the result file
--id-col	String		'accession'	Name of the sequence identifier column of the input file
--clust-col	String		'dna_profile'	Name of the mutation profile column of the input file
--var-type	String		'covsonar_dna'	Mutuation format (e.g. for DNA mutations from covsonar, use covsonar_dna). Possible values: [covsonar_dna
--sep	String		'\t'	Input file separator
--sep2	String		' '	Secondary clustering column separator (between each mutation)
--outdir	String		'output/'	Path of output directory
--trim-start	Integer		264	Bases to trim from the beginning
--trim-end	Integer		228	Bases to trim from the end
--reference-length	Integer		29903	Length of reference genome (defaults to NC_045512.2)
--skip-del	Bool		TRUE	Deletions will be skipped for calculating the pairwise distance of your input sequences.
--skip-ins	Bool		TRUE	Insertions will be skipped for calculating the pairwise distance of your input sequences.
--input-cache	Integer		None	Path to import results from previous run
--output-cache	String		None	Path to export results which can be used in the next run to decrease runtime.
--jobs	Integer		1	The number of jobs (=threads) to run simultaneously
--help	N/A		N/A	Show this help message and exit
--version	N/A		N/A	Show version and exit

Dependencies

breakfast runs under Python 3.10 and later. We rely heavily on some excellent open source python libraries: networkx, pandas, numpy, scikit-learn, click, and scipy.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.6

Jul 10, 2025

0.4.5

Nov 15, 2024

0.4.3

Oct 4, 2022

0.4.2

Jun 28, 2022

0.4.1

Jun 16, 2022

0.4.0

Jun 10, 2022

0.3.3

May 19, 2022

0.3.2

May 11, 2022

0.3.1

May 10, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

breakfast-0.4.6.tar.gz (13.1 kB view details)

Uploaded Jul 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

breakfast-0.4.6-py3-none-any.whl (12.5 kB view details)

Uploaded Jul 10, 2025 Python 3

File details

Details for the file breakfast-0.4.6.tar.gz.

File metadata

Download URL: breakfast-0.4.6.tar.gz
Upload date: Jul 10, 2025
Size: 13.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-1015-azure

File hashes

Hashes for breakfast-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`035c15514a7f5f10cc4970de1d7617c8b3c32c393547480413202fb4027f97ca`
MD5	`041f006d2e76178746daa2177a57db7c`
BLAKE2b-256	`224e6e3d2566540905fd916caf5d01b3a184364fcfe781e4627ee46b626ac8ba`

See more details on using hashes here.

File details

Details for the file breakfast-0.4.6-py3-none-any.whl.

File metadata

Download URL: breakfast-0.4.6-py3-none-any.whl
Upload date: Jul 10, 2025
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-1015-azure

File hashes

Hashes for breakfast-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b52dfd7369909b29a6ac0be0c6f45ece98ec24ae56ad757e60c9904e48abfbe`
MD5	`2a478cc9844b39929f11e730c3c23437`
BLAKE2b-256	`b673626daff09bbef4afa6237a0489e0533c45075be3b96ce5c81ad44b73b5f4`

See more details on using hashes here.

breakfast 0.4.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

breakfast - FAST outBREAK detection and sequence clustering

Installation

Installation using conda/mamba

Installation using pip

Example Command Line Usage

Simple test run

Using breakfast with input from covsonar

Using breakfast with input from Nextclade

Sequence feature formats

covsonar

Nextclade

Raw

Parameter description

Dependencies

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes