A complete suite for gene-by-gene schema creation and strain identification.
Project description
chewBBACA
chewBBACA stands for "BSR-Based Allele Calling Algorithm". The "chew" part could be thought of as "Comprehensive and Highly Efficient Workflow" but at this point it still needs a bit of work to make that claim, so we just add "chew" to add extra coolness to the software name. BSR stands for BLAST Score Ratio as proposed by Rasko DA et al.
chewBBACA is a comprehensive pipeline including a set of functions for the creation and validation of whole genome and core genome MultiLocus Sequence Typing (wg/cgMLST) schemas, providing an allele calling algorithm based on Blast Score Ratio that can be run in multiprocessor settings and a set of functions to visualize and validate allele variation in the loci. chewBBACA performs the schema creation and allele calls on complete or draft genomes.
Check the documentation for implementation details and guidance on using chewBBACA.
News
3.3.3 - 2024-02-23
-
Fixed warning related with BLASTp
--seqidlist
parameter. For BLAST>=2.9, the TXT file with the sequence IDs is converted to binary format withblastdb_aliastool
. -
The
Bio.Application
modules are deprecated and might be removed from future Biopython versions. Modified the function that calls MAFFT so that it uses the subprocess module instead ofBio.Align.Applications.MafftCommandline
. Changed the Biopython version requirement to >=1.79. -
Added a
pyproject.toml
configuration file and simplified the instructions insetup.py
. The use ofsetup.py
as a command line tool is deprecated and thepyproject.toml
configuration file allows to install and build packages through the recommended method. -
Updated the Dockerfile to install chewBBACA with
python3 -m pip install .
instead of the deprecatedpython setup.py install
command. -
Removed FASTA header integer conversion before running BLASTp. This was done to avoid a warning from BLAST related to sequence header length exceeding 50 characters.
-
The seqids and coordinates of the CDSs closest to contig tips are stored in a dictionary during gene prediction to simplify LOTSC and PLOT5/3 determination (in many cases this reduces runtime by ~20%).
-
Limited the number of values stored in memory while creating the
results_contigsInfo.tsv
andresults_alleles.tsv
output files to reduce memory usage. -
Adding data to the FASTA and TSV files for the missing classes per locus instead of storing the complete per input data to reduce memory usage.
-
The data for novel alleles is saved to files to reduce memory usage.
-
Fixed the in-frame stop codon count values displayed in the reports created by the SchemaEvaluator module.
-
The
UniprotFinder
module now exits cleanly if the output directory already exists. -
Improved info printed to the stdout by the CreateSchema and AlleleCall modules, added comments, and changed variable names to better match data being stored.
Check our Changelog to learn about the latest changes.
Citation
When using chewBBACA, please use the following citation:
Silva M, Machado MP, Silva DN, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço JA. 2018. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. Microb Genom 4:000166. doi:10.1099/mgen.0.000166
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chewBBACA-3.3.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eee0f4046c6f21f0a2ee7cbb9e63f92f8bde5694bbee10f9ec58a33c10099f7 |
|
MD5 | be760b11b9b7f42037e8f310fbbc9c1e |
|
BLAKE2b-256 | 9e0540df1ddf85d9af0af89a884d49f98b4827d2adab21533a1c155f09e66841 |